ondrejbartas / redis-model-extension

MIT License
3 stars 2 forks source link

[SUGGESTION] How to store Array and Hash fields in redis. #13

Closed mrazicz closed 12 years ago

mrazicz commented 12 years ago

In current gem version Hash and Array fields are stored in redis as JSON. I know, JSON is great format and in many cases it's very usefull to use JSON for storing information. But in this case i think not.

Reasons?

So I suggest to use [...].to_s / {...}.to_s except #to_json. Same for getting values - except JSON.parse(...) use eval String.

What do you think about this?

mrazicz commented 12 years ago

I discussed with @mihu about this and we find out that this can be dangerous. So this is questionable - it's security vs. immutable data. Another way to have immutable data is to use marshalling but this may be also slow, even slower than #to_json. I hope, we find out the best solution to have immutable data without security impact.

ghost commented 12 years ago

Some numbers...

require 'benchmark'
require 'bundler/setup'
require 'yajl'
require 'json'

array = ['bar', 'baz', 3.14, 42] 

n = 100000

puts 'marshal'
Benchmark.bm(8) do |x| 
  x.report("eval")   do  
    n.times do
      array.to_s
    end 
  end 
  x.report("yajl") do
    n.times do
      Yajl::Encoder.encode(array)
    end 
  end 
  x.report("json") do
    n.times do
      JSON.generate(array)
    end 
  end 
  x.report("marshal") do
    n.times do
      Marshal.dump(array)
    end 
  end 
end

puts 'umarshal'
Benchmark.bm(8) do |x| 
  x.report("eval")   do  
    input = array.to_s
    n.times do
      eval(input)
    end 
  end 
  x.report("yajl") do
    input = Yajl::Encoder.encode(array)
    n.times do
      parser = Yajl::Parser.new
      io = StringIO.new(input)
      parser.parse(io)
    end 
  end 
  x.report("json") do
    input = JSON.generate(array)
    n.times do
      JSON.parse(input)
    end 
  end 
  x.report("marshal") do
    input = Marshal.dump(array)
    n.times do
      Marshal.load(input)
    end 
  end 
end

And result on my machine

marshal
               user     system      total        real
eval       0.470000   0.000000   0.470000 (  0.472167)
yajl       0.710000   0.010000   0.720000 (  0.725274)
json       3.760000   0.000000   3.760000 (  3.778370)
marshal    0.540000   0.000000   0.540000 (  0.549607)

umarshal
               user     system      total        real
eval       1.310000   0.000000   1.310000 (  1.309227)
yajl       1.070000   0.000000   1.070000 (  1.066879)
json       0.640000   0.000000   0.640000 (  0.640838)
marshal    0.400000   0.000000   0.400000 (  0.405064)
mrazicz commented 12 years ago

Wow, few weeks ago I did similar benchmark, but only with json and eval and eval was "the winner" ... I don't remember my data for testing (it seems I underestimate data choice for this) but according to @mihu benchmark eval is outsider. Yajl is pretty fast and marshal is also quite good.

Ok, in this issue I'm completely failed, eval isn't a good choice. But maybe marshalling is?

ondrejbartas commented 12 years ago

Nice benchmark,

but if we would like to have compatibility with already existing redis database, I recommend to not throw away JSON.

Hash is taken care by Hashr - you don't need to take care about symbol and string keys - you are accesing it by method calls:

filed_hash = { :foo => "bar" } -> filed_hash.foo or in deep hash by: filed_hash = { :foo => { :bar => "rab" } } -> filed_hash.foo.bar

But from the test it is clear that JSON sucks. But what about combining YAJL and JSON

YAJL for writing data to redis (0.725274) and JSON for readin it (0.640838)

This will make it about 4 times faster

Please give me your advices :-)

ghost commented 12 years ago

I think that yajl is slower because of that streaming api, ie. need to create parser and StringIO every time. No idea how to benchmark it without that penalty.

Anyway, I don't think that marshaling/unmarshaling arrays will consume larger amount of time from the whole Model.find call and friends.

mrazicz commented 12 years ago

Imho yajl is slower than JSON for small arrays. For example for array with one item on my machine I have this results:

marshal
             user     system      total        real
eval       0.330000   0.000000   0.330000 (  0.335806)
yajl       0.790000   0.020000   0.810000 (  0.822857)
json       0.710000   0.000000   0.710000 (  0.712618)
marshal    0.570000   0.000000   0.570000 (  0.579103)

umarshal
             user     system      total        real
eval       1.590000   0.000000   1.590000 (  1.595217)
yajl       0.980000   0.000000   0.980000 (  0.996526)
json       0.700000   0.010000   0.710000 (  0.725672)
marshal    0.480000   0.000000   0.480000 (  0.483441)

I think it's because of StringIO. But without StringIO -> Yajl::Parser.parse(input) yajl is faster for parsing JSON:

marshal
             user     system      total        real
eval       0.320000   0.010000   0.330000 (  0.326991)
yajl       0.800000   0.020000   0.820000 (  0.819138)
json       0.730000   0.000000   0.730000 (  0.742923)
marshal    0.570000   0.010000   0.580000 (  0.577866)
umarshal
             user     system      total        real
eval       1.610000   0.000000   1.610000 (  1.621248)
yajl       0.570000   0.010000   0.580000 (  0.587422)
json       0.710000   0.000000   0.710000 (  0.715024)
marshal    0.480000   0.000000   0.480000 (  0.482712)

This benchmark are for n=100000 and array = ["try"].

@ondrejbartas:

ondrejbartas commented 12 years ago

Hi,

I implemented YAJL as encoder & parser for Array and Hash objects but with marshal I have some problems: Created new branch marshal: https://github.com/ondrejbartas/redis-model-extension/tree/marshal

Please look at it, I don't know what I am making wrong.

Thank you

ghost commented 12 years ago

I think, this is the source of problems;-)

# lib/redis-model-extension/initialize.rb
149     # initialize instance
150     def initialize(args={})
151       args = HashWithIndifferentAccess.new(args)

HashWithIndifferentAccess is turning every sub hash into hash with string keys...

ondrejbartas commented 12 years ago

fuck, I thought it is working only for main hash not whole structure

ondrejbartas commented 12 years ago

Ok, I updated initialize method to use symbolize keys instead. But It helped only for one test in marshal branch...

It has 3 errors in save & destroy test...

ondrejbartas commented 12 years ago

BTW test speed by using JSON was on my machine 2.7s and by using YAJL is now 0.6s :)

ghost commented 12 years ago

diff --git a/lib/redis-model-extension/value_transform.rb b/lib/redis-model-extension/value_transform.rb
index fc5f0dc..20f3e25 100644
--- a/lib/redis-model-extension/value_transform.rb
+++ b/lib/redis-model-extension/value_transform.rb
@@ -40,7 +40,7 @@ module RedisModelExtension
       when :float then value.to_f
       when :bool then value.to_s.to_bool
       when :symbol then value.to_s.to_sym
-      when :marshal then value.is_a?(String) ? Marshal.load(IO.new(value)) : value
+      when :marshal then value.is_a?(String) ? Marshal.load(value) : value
       when :array then value.is_a?(String) ? Yajl::Parser.parse(value) : value
       when :hash then value.is_a?(String) ? Hashr.new(Yajl::Parser.parse(value)) : Hashr.new(value)
       when :time then value.is_a?(String) ? Time.parse(value) : value

This passes all tests for me

ondrejbartas commented 12 years ago

Another good thing on YAJL is that it fixes problems with UTF-8 to US ANSII encoding - I encounter problem in one application where I saved hash into json and couldn't parse it by normal json. Now with yajl it is working .-)

ondrejbartas commented 12 years ago

Thank you for helping me with this. Merged into master: 7314fe6b2020da888fe06b8378a58f22808a154a

mrazicz commented 12 years ago

Nice, thx :) Imho we can close this issue.