ohler55 / oj

Optimized JSON
http://www.ohler.com/oj
MIT License
3.13k stars 252 forks source link

JSON.parse doesn't support object_class option after calling Oj.mimic_JSON() #239

Closed samnang closed 9 years ago

samnang commented 9 years ago

Look like after we call Oj.mimic_JSON(). JSON will be replaced with Oj, but some options in JSON doesn't support in Oj:

>> JSON.parse({a: 1}.to_json, object_class: OpenStruct)
=> #<OpenStruct a=1>

>> Oj.mimic_JSON()
=> JSON
>> JSON.parse({a: 1}.to_json, object_class: OpenStruct)
=> {"a"=>1}
ohler55 commented 9 years ago

Neither object_class or array_object are currently supported in mimic JSON mode. I have not been able to get the yardoc to pick up the JSON method documentation that describes the limit. The documentation at this point only exists at the bottom of the oj.c file.

samnang commented 9 years ago

Thanks, I saw that now. Should we close it or leave it here so if anyone has any idea to implement it?

ohler55 commented 9 years ago

I vote for closing. Until now no one has asked for it. Is it something you need or just checking for completeness?

samnang commented 9 years ago

Something I need, but it's not critical. Use it in my of my gem https://github.com/samnang/random_user_generator/blob/b112ba0107e43398d99c72cc5c63834be7de0827/lib/random_user_generator.rb#L18 , then when I use the gem in one of my rails project that have Oj.mimic_JSON(), it breaks because it return hash instead of openstruct's instance.

ohler55 commented 9 years ago

I assume results is a Hash. If that is the case then this

users   = JSON.parse(results.to_json, object_class: OpenStruct)

could be just this instead

users = OpenStruct.new(results)

Anyway, the reason object_class and array_class are not supported is that Oj makes use of the Ruby C API to add members to Hash and Array for performance reasons. Using a more generic object introduces a very large overhead. With enough demand I could put in checks and allow the performance to drop a bit but with the Oj's claim to fame being performance that doesn't seem like the right choice at this point.

samnang commented 9 years ago

OpenStruct.new(results) doesn't support to build nested automatically. Here is what I mean

users = OpenStruct.new(results)
users[0].name.last  # it will fail because name will be hash instead of object.

I totally agree on concerning about performance :+1:

ohler55 commented 9 years ago

true Have you taken a look at the Oj::Bag class? It isn't exactly what you are looking for but it might give you some ideas. I could also have that class or something similar take a nested Hash as an initialize option. Honestly though I've usually written my own methods and classes to follow a path instead. It ends up performing better. That might not be the case when compared to a struct though.

alxx commented 8 years ago

I have legacy code that treats JSON parsed code as an OpenStruct, and after migrating to Oj none of that works any longer. Is there any chance we'll be able to do a deep conversion of the parsed Hash into an OpenStruct? Thanks.

ohler55 commented 8 years ago

If I understand correctly you want all places where a Hash would be used to be an OpenStruct instead. Is that correct?

alxx commented 8 years ago

Well, optionally, of course.

At this moment I can replicate that behaviour using the recursive-open-struct gem, but I fear that loses me in performance whatever I gain by using Oj.

RecursiveOpenStruct.new(Oj.load_file(fname))

ohler55 commented 8 years ago

Have you done any benchmarks with flat OpenStructs? Compare OpenStruct.new() to RecursiveOpenStruct.new()? Possibly compare that to use the Oj::Bag class? If the Bag class is much faster I could make some changes that might help. I have to give it some thought though.

alxx commented 8 years ago

I haven't yet; I was just glad I found a solution. (Even so, RecursiveOpenStruct is far from perfect, as it leaves out some cases involving arrays, whereas object_class: OpenStruct in JSON worked perfectly, albeit slowly.)

OpenStruct.new doesn't really compare because it's shallow, so there's no point in including it in a benchmark, and I didn't investigate Oj::Bag.

ohler55 commented 8 years ago

Comparing RecursiveOpenStruct to OpenStruct would give us an idea about how much slower general code of RecursiveOpenStruct is even though it is not the final solution. Oj::Bag would tell us how much of a win writing special code in Oj would help. If it is only a few percent faster then using RecursiveOpenStruct would be just as good. I doubt that is the case though. Finally if a new option is added to Oj then the new option can be compared to the others to see how it stacks up.

alxx commented 8 years ago

OK, here it is. I'm using an 8 MB JSON file randomly created at json-generator.com

First, the regular JSON, with an IO read:

Benchmark.measure { 10.times { JSON.parse(File.open('/tmp/bigjson').read); nil } }
 => #<Benchmark::Tms:0x007f7fed23a5e0 @label="", @real=2.202982, @cstime=0.0, @cutime=0.0, @stime=0.098, @utime=2.096, @total=2.19> 

Then, the same JSON library but parsing into OpenStruct:

> Benchmark.measure { 10.times { JSON.parse(File.open('/tmp/bigjson').read, object_class: OpenStruct); nil } }
 => #<Benchmark::Tms:0x007f7fdbf63490 @label="", @real=14.854887, @cstime=0.0, @cutime=0.0, @stime=0.4, @utime=14.3, @total=14.8

As you can see, adding OpenStruct to the basic JSON gem slows it by a factor of about 7.

Next up, Oj. First, regular parsing:

> Benchmark.measure { 10.times { Oj.load_file('/tmp/bigjson'); nil } }
 => #<Benchmark::Tms:0x007f7fe6f70e28 @label="", @real=1.401675, @cstime=0.0, @cutime=0.0, @stime=0.06, @utime=1.32, @total=1.38>

But, unfortunately, coupled with RecursiveOpenStruct with 'recurse_over_arrays' set to true, it times out; there's probably a bug in ROS. Trying without 'recurse_over_arrays' yields:

> Benchmark.measure { 10.times { RecursiveOpenStruct.new(Oj.load_file('/tmp/bigjson')); nil } }
 => #<Benchmark::Tms:0x007f7fdd0711b0 @label="", @real=1.308962, @cstime=0.0, @cutime=0.0, @stime=0.06, @utime=1.25, @total=1.31>

Reasonable, but not deep enough so not very useful.

So ROS doesn't really add all that much to Oj, which is not bad news. (Or it completely kills it, which is not good news.)

As for Oj::Bag, I tried this: Oj::Bag.new(Oj.load_file('/tmp/bigjson')) but it crashed with a "is not a symbol" exception, so I couldn't benchmark it. Maybe I'm not bagging it right, or there's something fishy with the json file. I'm attaching it.

bigjson.zip

ohler55 commented 8 years ago

Your choice of benchmark tells me a bit more about how you are using Oj and others. That was not the way to make use of the Bag class. It is used to create a new Ruby class when loading a dumped object. Since you are not dumping but reading preexisting JSON the bag is not going to be a good example with the current version of Oj. It might be an option if I make changes that allow OpenStruct to be used though.

Let me give it some though as to how to give a higher performance work-around.

alxx commented 8 years ago

I'd never heard of Oj::Bag before you asked me, so that's all I could come up with by looking in the Oj source. And yes, I'm not dumping, my use case is strictly reading. The dumping speed isn't that important to me.

Thanks!

ohler55 commented 8 years ago

I had a chance to look at this in more detail this evening. Here are a couple of observations.

  1. OpenStruct is more expensive to populate from C as a Ruby method must be called for each set operations. That will most likely reduce any potential improvement. Not saying it will not help. It will but you are leaving a lot on the table.
  2. The Oj::saj parser could be used to do what you want. It might perform close to an OpenStruct specific set of changes in Oj itself.
  3. Depending on how much you can change your code, the Oj::Doc approach would be by far the most performant.

So it comes down to how much code you are willing to change and what is more important, performance or code changes. Can you give me a bit more background on those two options?

alxx commented 8 years ago

I bit the bullet and changed the code to use Hash keys instead of methods. So, as far as I'm concerned, there's no urgency anymore.

But I still believe there's a market for parsing JSON into structs (by which I mean objects with accessors, rather than Hashes). Most people who are aware it's an option with the native json gem, do it. To be able to do it quicker, using Oj, would be rather useful.

Thanks for being so responsive!

ohler55 commented 8 years ago

Maybe another mode then. Provide the class to use for populating instead of a Hash and it has to implement all the methods for setting with the hash keys as method names or rather key=. Sound right to you?

alxx commented 8 years ago

I'm not sure. Isn't that another way to reinvent OpenStruct? Can you give a hypothetical example?

ohler55 commented 8 years ago

I could certainly implement an object that behaves like OpenStruct. It would not be an OpenStruct though. Would that fit the bill? The solution would be a cross between the Oj::Doc and the regular parse. Maybe adding some additional method to Oj::Doc might work. I'll look at that too.

alxx commented 8 years ago

Sounds great! I'll watch this space for updates :)