Closed mrazicz closed 12 years ago
I discussed with @mihu about this and we find out that this can be dangerous. So this is questionable - it's security vs. immutable data. Another way to have immutable data is to use marshalling but this may be also slow, even slower than #to_json. I hope, we find out the best solution to have immutable data without security impact.
Some numbers...
require 'benchmark'
require 'bundler/setup'
require 'yajl'
require 'json'
array = ['bar', 'baz', 3.14, 42]
n = 100000
puts 'marshal'
Benchmark.bm(8) do |x|
x.report("eval") do
n.times do
array.to_s
end
end
x.report("yajl") do
n.times do
Yajl::Encoder.encode(array)
end
end
x.report("json") do
n.times do
JSON.generate(array)
end
end
x.report("marshal") do
n.times do
Marshal.dump(array)
end
end
end
puts 'umarshal'
Benchmark.bm(8) do |x|
x.report("eval") do
input = array.to_s
n.times do
eval(input)
end
end
x.report("yajl") do
input = Yajl::Encoder.encode(array)
n.times do
parser = Yajl::Parser.new
io = StringIO.new(input)
parser.parse(io)
end
end
x.report("json") do
input = JSON.generate(array)
n.times do
JSON.parse(input)
end
end
x.report("marshal") do
input = Marshal.dump(array)
n.times do
Marshal.load(input)
end
end
end
And result on my machine
marshal
user system total real
eval 0.470000 0.000000 0.470000 ( 0.472167)
yajl 0.710000 0.010000 0.720000 ( 0.725274)
json 3.760000 0.000000 3.760000 ( 3.778370)
marshal 0.540000 0.000000 0.540000 ( 0.549607)
umarshal
user system total real
eval 1.310000 0.000000 1.310000 ( 1.309227)
yajl 1.070000 0.000000 1.070000 ( 1.066879)
json 0.640000 0.000000 0.640000 ( 0.640838)
marshal 0.400000 0.000000 0.400000 ( 0.405064)
Wow, few weeks ago I did similar benchmark, but only with json and eval and eval was "the winner" ... I don't remember my data for testing (it seems I underestimate data choice for this) but according to @mihu benchmark eval is outsider. Yajl is pretty fast and marshal is also quite good.
Ok, in this issue I'm completely failed, eval isn't a good choice. But maybe marshalling is?
Nice benchmark,
but if we would like to have compatibility with already existing redis database, I recommend to not throw away JSON.
Hash is taken care by Hashr - you don't need to take care about symbol and string keys - you are accesing it by method calls:
filed_hash = { :foo => "bar" } -> filed_hash.foo or in deep hash by: filed_hash = { :foo => { :bar => "rab" } } -> filed_hash.foo.bar
But from the test it is clear that JSON sucks. But what about combining YAJL and JSON
YAJL for writing data to redis (0.725274) and JSON for readin it (0.640838)
This will make it about 4 times faster
Please give me your advices :-)
I think that yajl is slower because of that streaming api, ie. need to create parser and StringIO every time. No idea how to benchmark it without that penalty.
Anyway, I don't think that marshaling/unmarshaling arrays will consume larger amount of time from the whole Model.find
call and friends.
Imho yajl is slower than JSON for small arrays. For example for array with one item on my machine I have this results:
marshal
user system total real
eval 0.330000 0.000000 0.330000 ( 0.335806)
yajl 0.790000 0.020000 0.810000 ( 0.822857)
json 0.710000 0.000000 0.710000 ( 0.712618)
marshal 0.570000 0.000000 0.570000 ( 0.579103)
umarshal
user system total real
eval 1.590000 0.000000 1.590000 ( 1.595217)
yajl 0.980000 0.000000 0.980000 ( 0.996526)
json 0.700000 0.010000 0.710000 ( 0.725672)
marshal 0.480000 0.000000 0.480000 ( 0.483441)
I think it's because of StringIO. But without StringIO -> Yajl::Parser.parse(input) yajl is faster for parsing JSON:
marshal
user system total real
eval 0.320000 0.010000 0.330000 ( 0.326991)
yajl 0.800000 0.020000 0.820000 ( 0.819138)
json 0.730000 0.000000 0.730000 ( 0.742923)
marshal 0.570000 0.010000 0.580000 ( 0.577866)
umarshal
user system total real
eval 1.610000 0.000000 1.610000 ( 1.621248)
yajl 0.570000 0.010000 0.580000 ( 0.587422)
json 0.710000 0.000000 0.710000 ( 0.715024)
marshal 0.480000 0.000000 0.480000 ( 0.482712)
This benchmark are for n=100000 and array = ["try"].
@ondrejbartas:
Hi,
I implemented YAJL as encoder & parser for Array and Hash objects but with marshal I have some problems: Created new branch marshal: https://github.com/ondrejbartas/redis-model-extension/tree/marshal
Please look at it, I don't know what I am making wrong.
Thank you
I think, this is the source of problems;-)
# lib/redis-model-extension/initialize.rb
149 # initialize instance
150 def initialize(args={})
151 args = HashWithIndifferentAccess.new(args)
HashWithIndifferentAccess
is turning every sub hash into hash with string keys...
fuck, I thought it is working only for main hash not whole structure
Ok, I updated initialize method to use symbolize keys instead. But It helped only for one test in marshal branch...
It has 3 errors in save & destroy test...
BTW test speed by using JSON was on my machine 2.7s and by using YAJL is now 0.6s :)
diff --git a/lib/redis-model-extension/value_transform.rb b/lib/redis-model-extension/value_transform.rb
index fc5f0dc..20f3e25 100644
--- a/lib/redis-model-extension/value_transform.rb
+++ b/lib/redis-model-extension/value_transform.rb
@@ -40,7 +40,7 @@ module RedisModelExtension
when :float then value.to_f
when :bool then value.to_s.to_bool
when :symbol then value.to_s.to_sym
- when :marshal then value.is_a?(String) ? Marshal.load(IO.new(value)) : value
+ when :marshal then value.is_a?(String) ? Marshal.load(value) : value
when :array then value.is_a?(String) ? Yajl::Parser.parse(value) : value
when :hash then value.is_a?(String) ? Hashr.new(Yajl::Parser.parse(value)) : Hashr.new(value)
when :time then value.is_a?(String) ? Time.parse(value) : value
This passes all tests for me
Another good thing on YAJL is that it fixes problems with UTF-8 to US ANSII encoding - I encounter problem in one application where I saved hash into json and couldn't parse it by normal json. Now with yajl it is working .-)
Thank you for helping me with this. Merged into master: 7314fe6b2020da888fe06b8378a58f22808a154a
Nice, thx :) Imho we can close this issue.
In current gem version Hash and Array fields are stored in redis as JSON. I know, JSON is great format and in many cases it's very usefull to use JSON for storing information. But in this case i think not.
Reasons?
So I suggest to use [...].to_s / {...}.to_s except #to_json. Same for getting values - except JSON.parse(...) use eval String.
What do you think about this?