ohler55 / oj

Optimized JSON
http://www.ohler.com/oj
MIT License
3.14k stars 251 forks source link

Some bugs #340

Closed stereobooster closed 7 years ago

stereobooster commented 7 years ago

It took me some time to pickup all settings. But I was able to setup it. And it is most compatible thing with Rails to_json, and much faster than Railsto_json. Thanks a lot for this gem.

My repo for benchmarks. During my tests found some bugs. Will report here:

Questions:

Suggestions:

Other notes: my OS is macOS Sierra.

+---------------------------------+---------------+------------------+------------------+---------------------------+
| class                           | JSON.generate | Oj.dump (object) | Oj.dump (compat) | Oj.dump (compat, as_json) |
+---------------------------------+---------------+------------------+------------------+---------------------------+
| Regexp                          | πŸ‘Œ            | ❌               | ❌               | πŸ‘Œ                        |
| FalseClass                      | πŸ‘Œ            | πŸ‘Œ               | πŸ‘Œ               | πŸ‘Œ                        |
| NilClass                        | πŸ‘Œ            | πŸ‘Œ               | πŸ‘Œ               | πŸ‘Œ                        |
| Object                          | ❌            | ❌               | πŸ‘Œ               | πŸ‘Œ                        |
| TrueClass                       | πŸ‘Œ            | πŸ‘Œ               | πŸ‘Œ               | πŸ‘Œ                        |
| String                          | πŸ‘Œ            | πŸ‘Œ               | πŸ‘Œ               | πŸ‘Œ                        |
| StringChinese                   | πŸ‘Œ            | πŸ‘Œ               | πŸ‘Œ               | πŸ‘Œ                        |
| Numeric                         | πŸ‘Œ            | πŸ‘Œ               | πŸ‘Œ               | πŸ‘Œ                        |
| Symbol                          | πŸ‘Œ            | ❌               | πŸ‘Œ               | πŸ‘Œ                        |
| Time                            | ❌            | ❌               | ❌               | πŸ‘Œ                        |
| Array                           | πŸ‘Œ            | πŸ‘Œ               | πŸ‘Œ               | πŸ‘Œ                        |
| Hash                            | πŸ‘Œ            | πŸ‘Œ               | πŸ‘Œ               | πŸ‘Œ                        |
| HashNotEmpty                    | πŸ‘Œ            | ❌               | πŸ‘Œ               | πŸ‘Œ                        |
| Date                            | πŸ‘Œ            | ❌               | πŸ‘Œ               | πŸ‘Œ                        |
| DateTime                        | ❌            | πŸ’€               | ❌               | πŸ‘Œ                        |
| Enumerable                      | ❌            | ❌               | ❌               | πŸ‘Œ                        |
| BigDecimal                      | πŸ‘Œ            | πŸ‘Œ               | πŸ‘Œ               | πŸ‘Œ                        |
| BigDecimalInfinity              | ❌            | ❌               | ❌               | πŸ‘Œ                        |
| Struct                          | ❌            | ❌               | ❌               | πŸ‘Œ                        |
| Float                           | πŸ‘Œ            | πŸ‘Œ               | πŸ‘Œ               | πŸ‘Œ                        |
| FloatInfinity                   | πŸ’€            | ❌               | πŸ‘Œ               | πŸ‘Œ                        |
| Range                           | πŸ‘Œ            | ❌               | πŸ‘Œ               | πŸ‘Œ                        |
| Process::Status                 | ❌            | ❌               | ❌               | πŸ‘Œ                        |
| ActiveSupport::TimeWithZone     | ❌            | ❌               | ❌               | πŸ‘Œ                        |
| ActiveModel::Errors             | ❌            | πŸ’€               | πŸ’€               | πŸ‘Œ                        |
| ActiveSupport::Duration         | ❌            | ❌               | ❌               | πŸ‘Œ                        |
| ActiveSupport::Multibyte::Chars | πŸ‘Œ            | ❌               | ❌               | πŸ‘Œ                        |
| ActiveRecord::Relation          | ❌            | ❌               | ❌               | πŸ‘Œ                        |
+---------------------------------+---------------+------------------+------------------+---------------------------+
ruby benchmark2.rb
Calculating -------------------------------------
            to_json:   437.048M memsize (    30.968k retained)
                         8.341M objects (   117.000  retained)
                        50.000  strings (    42.000  retained)
                 Oj:   101.720M memsize (     0.000  retained)
                         1.590M objects (     0.000  retained)
                        38.000  strings (     0.000  retained)

Comparison:
                 Oj::  101720000 allocated
            to_json::  437048167 allocated - 4.30x more

---------------------------------------------

                     user     system      total        real
to_json:         3.410000   1.090000   4.500000 (  4.544330)
Oj:              0.720000   0.010000   0.730000 (  0.727759)
ohler55 commented 7 years ago

I like the benchmarks. I'm not sure I understand the table though. If it is a comparison to JSON how can JSON fail on some types?

When running mimic_JSON it sets up the defaults. I thought I had them correct but maybe not. I'll check against your settings.

I'll take a look at your tests as well to see why you are having issues with DateTime and ActiveModel::Errors. DateTime in particular seems odd as there are several unit test for that type.

stereobooster commented 7 years ago

Comparing to Rails to_json which is not the same as JSON.generate

ohler55 commented 7 years ago

Ah, got it, thanks. I followed reference you provided on some other page to look at rails as_json(). I noticed it too is deprecated but there is no suggested alternative. Do you happen to know what will replace it?

stereobooster commented 7 years ago

I haven't found official info yet. Maybe this one active_support/json/encoding.rb

stereobooster commented 7 years ago

Update:

And found some unsupported classes (at least some of them doesn't work)

Complex: Complex('0.3-0.5i'),
Exception: Exception.new,
OpenStruct: OpenStruct.new(:country => "Australia", :population => 20_000_000),
Rational: Rational(0.3),
ohler55 commented 7 years ago

Usually I would use mimic_JSON but in this case you are comparing JSON to OJ so that doesn't work. I have a few suggestions and there are some changes that need to be made to Oj.

You can get rid of the comparison with :object mode. It was never intended to be anything like the JSON gem. It is for efficient encoding and decoding. By the way, the DateTime worked fine for me here with your code with the datetime key compare removed.

Next the options should be like this:

OJ_2 = { mode: :compat, use_as_json: false, float_precision: 16, bigdecimal_as_decimal: false, nan: :null, time_format: :xmlschema, second_precision: 3 }
OJ_3 = { mode: :compat, use_as_json: true,  float_precision: 16, bigdecimal_as_decimal: false, nan: :null, time_format: :xmlschema, second_precision: 3 }

My first set of fixes for OJ are these but I will get to the others and figure out how to deal with the ActiveSupport as well. Maybe the as_json is th eonly way for those though.

 - round off of date when sec prec should be round, not cut off
 - for DateTime.new should include fraction
 - BigDecimal infinity should follow :nan option

There should be a new branch called

ohler55 commented 7 years ago

We might be using a different version of active support. I just installed the latest set. Using the oj/active_support_helper adds the datetime creation function so it might be replacing the existing one. Lets no do that for now.

if escape_modes fails when set to :xss_safe let not do that. :-)

If you keep your compatibility test up to date I'll keep pulling and we can work toward a 100% match on compat mode.

stereobooster commented 7 years ago

I updated test results

ohler55 commented 7 years ago

I'll keep updating the rail-compat branch. Should be able to get more positive results over the next few days.

stereobooster commented 7 years ago

I narrowed down problem with DateTime

if Rails::VERSION::MAJOR == 5 && RUBY_VERSION == "2.2.6"
  # see https://github.com/ohler55/oj/commit/050b4c70836394cffd96b63388ff0dedb8ed3558
  require 'oj/active_support_helper'
end
stereobooster commented 7 years ago

huh... I got an idea if we could get to 100% compatibility with rails, we can patch .to_json after rails again.

Make .to_json fast again

ohler55 commented 7 years ago

The tricky part is going to be the active support types I think. Maybe expanding the active_support_helper will take care of that though. Working together to hit 100% would be great.

stereobooster commented 7 years ago

From my POV real stoppers are

Plus there are potential incompatibilities with JSON gem, which is also support (got list from here):

But this can be tracked in other ticket.

ohler55 commented 7 years ago

Tracking here is fine. I'll be using this to get differences resolved. I have a day job so will work on it this evening. PST time.

ohler55 commented 7 years ago

Regexp now compatible and I added a new :unicode_xss escape mode to allow unicode but < > and some others like \u2028 are escaped.

Started on Enumerable. I discovered to_json ends up calling as_json and then drops back into Oj to encode. I'm not sure it makes sense to try and intercept that and do a direct encoding. If to_json uses as_json then maybe that is the correct approach for some of the more unusual types. What are your thoughts?

stereobooster commented 7 years ago

If to_json uses as_json then maybe that is the correct approach for some of the more unusual types. What are your thoughts?

This seems to be a reasonable approach here. And also it is pretty good in benchmarks. Feel a bit guilt, because you are doing all heavy lifting here. Will try to do something useful too.

ohler55 commented 7 years ago

You provide the knowledge of rails. Something I am lacking.

ohler55 commented 7 years ago

I assume purpose of this set of changes is to keep rails as fast as possible and thats why you have been trying to avoid as_json. Is that correct? If so it sounds like the approach should be use Oj first and then revert to as_json for the more complete objects. We have to figure out which category each type falls into.

stereobooster commented 7 years ago

I assume purpose of this set of changes is to keep rails as fast as possible

yes

and thats why you have been trying to avoid as_json

no. Thats why I was trying to avoid Rails implementation of to_json.

In rails project to_json is slower and consumes more memory than JSON.generate even if I use Oj.mimic_JSON. See:

Benchmark ```ruby require './test_data' require 'benchmark' begin require 'benchmark/memory' rescue LoadError => e end Oj.mimic_JSON() begin require 'json' rescue Exception end obj = TEST_DATA.dup obj.delete(:'ActiveModel::Errors') obj.delete(:'ActiveRecord::Relation') puts "\n" if Benchmark.respond_to?(:memory) Benchmark.memory do |x| x.report('to_json:'){ 10_000.times { obj.to_json } } Oj.default_options = OJ_1 x.report('Oj.dump o') { 10_000.times { Oj.dump(obj) } } Oj.default_options = OJ_2 x.report('Oj.dump c') { 10_000.times { Oj.dump(obj) } } Oj.default_options = OJ_3 x.report('Oj.dump c, aj') { 10_000.times { Oj.dump(obj) } } x.compare! end puts "---------------------------------------------\n\n" end Benchmark.benchmark(Benchmark::CAPTION, 14, Benchmark::FORMAT) do |x| x.report('to_json:'){ 10_000.times { obj.to_json } } Oj.default_options = OJ_1 x.report('Oj.dump o') { 10_000.times { Oj.dump(obj) } } Oj.default_options = OJ_2 x.report('Oj.dump c') { 10_000.times { Oj.dump(obj) } } Oj.default_options = OJ_3 x.report('Oj.dump c, aj') { 10_000.times { Oj.dump(obj) } } end puts "\n" ```
Results ``` Calculating ------------------------------------- to_json: 216.441M memsize ( 168.000 retained) 3.880M objects ( 2.000 retained) 50.000 strings ( 0.000 retained) Oj.dump o 55.880M memsize ( 0.000 retained) 990.000k objects ( 0.000 retained) 38.000 strings ( 0.000 retained) Oj.dump c 55.880M memsize ( 0.000 retained) 990.000k objects ( 0.000 retained) 38.000 strings ( 0.000 retained) Oj.dump c, aj 55.880M memsize ( 0.000 retained) 990.000k objects ( 0.000 retained) 38.000 strings ( 0.000 retained) Comparison: Oj.dump c, aj: 55880000 allocated Oj.dump o: 55880000 allocated - same Oj.dump c: 55880000 allocated - same to_json:: 216440720 allocated - 3.87x more --------------------------------------------- user system total real to_json: 1.430000 0.930000 2.360000 ( 2.368436) Oj.dump o 0.530000 0.010000 0.540000 ( 0.542466) Oj.dump c 1.090000 0.100000 1.190000 ( 1.188800) Oj.dump c, aj 0.390000 0.000000 0.390000 ( 0.399239) ```

And I found reason for this:

Reason ```ruby # this is what Rails do in rails/activesupport/lib/active_support/core_ext/object/json.rb module OjMimicTest def to_json(options = nil) p 1 super options end end [Object, Array, FalseClass, Float, Hash, Integer, NilClass, String, TrueClass, Enumerable].reverse_each do |klass| klass.prepend(OjMimicTest) end # mimic_JSON require 'oj' Oj.mimic_JSON() begin require 'json' rescue Exception end # testing {}.to_json # this will print 1 in console module OjMimicTest def to_json(options = nil) super options end end {}.to_json # this will not print 1 in console ```

I want achieve total compatibility and remove Rails magic, then it will be real improvement

stereobooster commented 7 years ago

I imagine something like this

def mimic_rails
  require 'oj'
  if Rails::VERSION::MAJOR == 5 && RUBY_VERSION == "2.2.6"
    # see https://github.com/ohler55/oj/commit/050b4c70836394cffd96b63388ff0dedb8ed3558
    require 'oj/active_support_helper'
  end

  Oj.default_options = { 
    float_precision: 16,
    bigdecimal_as_decimal: false,
    nan: :null,
    time_format: :xmlschema,
    second_precision: 3,
    escape_mode: :unicode_xss,
    mode: :compat, 
    use_as_json: true,
  }

  Oj.mimic_JSON()
  begin
    require 'json'
  rescue Exception
  end

  module ActiveSupport
    module ToJsonWithActiveSupportEncoder # :nodoc:
      def to_json(options = nil)
        super(options)
      end
    end
  end
end
ohler55 commented 7 years ago

Something I am a little surprised with is that Oj with as_json is the faster of the 3 Oj combinations. I wonder if that is due to the exceptions.

Anyway with the goal in mind of making rails faster maybe you can help with some temporarily modified tests. If the types that raise an exception are commented out, is straight compat without as_json faster than with as_json? What that will tell us is whether it is best to focus on eliminating as_json calls or to continue with them.

Assuming calls to as_json are expensive then an approach that avoid them when possible can be take. Maybe even a special class handler for the common active support types.

ohler55 commented 7 years ago

Nevermind on the request. I see in benchmark2.rb the two type that raised exceptions were removed.

My results with benchmark2.rb were slightly different on a Ubuntu machine with the rails-compat branch.

                     user     system      total        real
to_json:         1.620000   0.000000   1.620000 (  1.619461)
Oj.dump o        0.150000   0.000000   0.150000 (  0.144752)
Oj.dump c        0.290000   0.000000   0.290000 (  0.294706)
Oj.dump c, aj    0.240000   0.000000   0.240000 (  0.235439)
stereobooster commented 7 years ago

I pushed to compatibility test repo with rails-compat branch. For some reasons can't run benchmarks. Important note: I'm talking about speed, but it is actually memory and allocations initial issue for me. For example if you are using something like logstash-logger, which converts logs to JSON to send them, and you do a lot of logging you will end up with huge number of allocations and objects trashing.

Related: https://github.com/dwbutler/logstash-logger/pull/118

ohler55 commented 7 years ago

Interesting you mention using JSON for log entries. You might be interested in my piper-ruby gem, https://github.com/ohler55/piper-ruby. Super simple. Coupled with NATs for publishing and Piper for you get log files and the start of an ops console.

stereobooster commented 7 years ago

Found what is the problem with Rational. Oj in "rails compat" mode returns "\"5404319552844595\\/18014398509481984\"" while rails code doesn't escape /. It escapes only \u2028\u2029><&.

ohler55 commented 7 years ago

That is unfortunate since the JSON spec calls for it to be escaped. Maybe a note indicated that mode may generate invalid JSON.

stereobooster commented 7 years ago

In my tests .to_json and JSON.generate both generate same "wrong" JSON. Added StringSpecial2 ("\\/") to compatibility tests.

stereobooster commented 7 years ago

I can not insist to use this option by default, because there is JSON standard, but it would be nice at least have an option to change this

ohler55 commented 7 years ago

Okay, today's updates were for Rational, Struct, Enumerable, and the "\/" change for the unicode_xss escape mode.

I also figured out why as_json was faster than just compat. On some of the active types all the attributes were dumped which was more expensive than just the desired ones. That means avoiding as_json (or to_json) is the way to go. I'm going after the non-active ones first and will then decide how best to handle the active ones. Maybe if the ActiveXXX module then also do an as_json for that only. Might need a different rails mode or something.

With the active type commented out along with other non-passing I get this:

                     user     system      total        real
to_json:         1.290000   0.000000   1.290000 (  1.286998)
Oj.dump o        0.100000   0.000000   0.100000 (  0.099203)
Oj.dump c        0.080000   0.000000   0.080000 (  0.084050)
Oj.dump c, aj    0.150000   0.000000   0.150000 (  0.156359)
ohler55 commented 7 years ago

I haven't updated to use your latest tests since you mentioned the benchmarks were not working. If it working again let me know and I will do a pull.

stereobooster commented 7 years ago

I'm thinking of some hybrid mode, which will use Oj built in serialisation for known objects, like Regexp, Range etc. and will fallback to as_json otherwise.

Before adding even more options lets count what we have now:

Parsing: 4*4*2*2*2*2*2*2*2 * 2*2*4*2*2*5 = 655360 Serialization: 4*4*2*2*2*2*2*2*2 * 2*2*2*3*2*2*2*2 = 786432

For all non sets values I substituted 2 (for Numeric or Class params). Not every combination makes sense.

This is just side-note. This will not stop us from doing Rails compatible mode and add good test coverage for it

ohler55 commented 7 years ago

Yup, there are a lot of options. They grow over time as people ask for various features.

Back to the hybrid mode. I was thinking along similar lines but something like this in combat mode.

  1. Handle the easily detected type like string, hash, array, Regexp, Rational, Enumerable, etc.
  2. If an object check the root module. If Active then: 2a. If in a list of 'here is the attributes to encode' then use that. This is be faster than as_json. 2b. call as_json.

Note only the active ones will have as_json called. If non-active types should have as_json used then turn on the use_as_json option. This should keep compatibility with previous versions and non-rails use yet give a huge performance boost to rails. I am shooting for 10x over to_json will the full test types.

ohler55 commented 7 years ago

Range is now rails compatible.

Your tests made it very clear that to_json is not the same as rails. Maybe compat mode should be for to_json and a new rails mode for rails compatibility. Is that your understanding as well?

stereobooster commented 7 years ago

This is just meditation on the subject, do not take it as final judgment or critic.

Portability

Having global settings lowers portability. If two different libs assume different global settings, you won't be able to use them without modification.

Oj.dump(value) (or Oj.load) should always return the same value in all environments until you use the same version of a gem. Portability not guaranteed across different versions of gems.

If users want to have global settings, they can easily create wrapper class, wich will expose default settings behavior.

class OjGlobal
  class << self
    attr_accessor :default_options
  end

  def dump(value, options = {})
    Oj.dump(value, self.class.default_options.merge(options))
  end

  def load(values, options = {})
    Oj.load(value, self.class.default_options.merge(options))
  end
end

Compatibility

Oj.mimic_JSON will patch JSON object, and all objects which are patched by json gem.

JSON.parse (or JSON.generate) will behave the same way before and after the patch, e.g. it will produce same results, and accept same options, which are acceptable by json gem.

Same time Oj.dump doesn't have to behave the same way as JSON.generate before or after Oj.mimic_JSON call.

There is also will be Oj.mimic_rails, which will internally call Oj.mimic_JSON if it hasn't been called before and will patch the same set of objects which is patched by Rails and will remove Rails magic to improve speed.

After Oj.mimic_rails call .to_json (or JSON.generate) will behave the same way as before.

If some users want to change the default behavior of JSON or .to_json they can easily do it with built in Ruby facilities themselves.

Test coverage

By removing global settings we will simplify tests. We will need to test default mode, test compatibility for Oj.mimic_JSON and Oj.mimic_rails and simple unit tests for all possible combinations of second argument for Oj.dump and Oj.load.

Plus we have TravisCI setup with a matrix of all possible combinations of Rails and ruby. This will guarantee our users that they can safely plug Oj in covered Rails and ruby versions combinations without fear and get an instant speed boost and memory improvement.

What do you think?

ohler55 commented 7 years ago

I think if Oj was not trying to be compatible with the json gem and rails your approach make complete sense and Oj. actually started out without global options. As soon as rails support came into the picture people demanded some way of have setting globally so they did not have to either call with those setting every time or wrap every write wrappers which rails would then ignore. Right now moving to that model would be a major change in the API and I know it would break a lot of code out there.

Having said that you might prefer another direction I was thinking about. I was not planning on a mimic_rails but instead was thinking of coming up with a set of options that would make Oj rails compatible. By providing those in a function call like give_me_the_rails_options() that could be used as an argument to dump or generate as your describe or could be used as globals. That would leave the existing APIs in place with no change other than the addition of calls to get pre-defined option sets.

I'm afraid globals aren't going away but deprecating their use in favor of options passed in on each call might be a way forward. Comments?

Anyway, with the addition of yet another mode, i.e., rails I am going to take a day or two to refactor the dump code a bit. It will be cleaner and maybe even slightly faster.

stereobooster commented 7 years ago

I think if Oj was not trying to be compatible with the json gem

Generally yes. If you use standalone Oj, than it has nothing to do with JSON gem.

But if we talk about mimic_JSON, then it makes sense to think about compatibility. Not sure what is the point to replace existing interface with something incompatible. Yes existing interface maybe dumb, wrong, irrational or anything else. But this is how its done. We have versions of existing interface (ruby version), and we can track compatibility with different versions by stating ruby version in Gemspec. Anything wrong about current implementation of JSON gem can be reported as bug in ruby bugtracker.

You can think of mimic_JSON as separate layer (gem), which will bring compatibility with JSON gem, while Oj will be pure library with settings, but without globals.

I'm afraid globals aren't going away but deprecating their use in favor of options passed in on each call might be a way forward.

I'm generally not proposing to break everything right away. We can, for example, expose OjCore which will be like Oj but without globals, same time Oj will become wrapper for OjCore. Oj will have same interface as now plus deprecation notice for globals.

OjCore will be reused in compatibility modes for JSON and rails, but main idea here is that global settings of Oj will not influence compatibility modes. I guess this is breaking changes. But I'm ready to hear arguments, why somebody decided break compatibility in the first place. I mean you don't do things like this:

class Object
  def to_json
    "bla bla"
  end
end

Why would you chose to replace JSON methods with something incompatible.

My intention is to have: compatible layer (so you do not need to worry, if it works or not), reliable (having globals make situation worse), fully tested and transparent in sense of supported versions (of ruby and rails).


Note on as_json.

This proposal says:

JSON.parse (or JSON.generate) will behave the same way before and after the patch

After it will behave as it is expected to behave before. E.g. if a user will redefine as_json function for some built-in class, like Hash or Array, this will be not covered by Oj. But I do not see any problems to use as_json for everything else what is not built-in. Or I miss something and this will be huge performance drop even for built-in objects?

stereobooster commented 7 years ago

ohler55 commented 7 years ago

Nice xkcd comic.

Actually, the use of as_json or to_json is a significant performance hit. The tricky part is knowing when the user wants as_json called and when they don't. The user can add as_json to any class as evident with rails and active. The options use_to_json and use_as_json exist to provide a simple on/off for using those methods. What we are both talking about for rails is something that is not a simple on/off switch but some semi-intelligent approach to only using as_json when we don't want to mimic rails son encoding in Oj. Of course that means we will miss any user defined as_json unless the use_as_json switch is through or some kind of indicator is attached to the user class or in a global registry table. (a global table is faster and could be built from some indicator on the class)

I think we are agreeing on the direction Oj should move. The details of how to get there may differ a little but I think a lot of that is the way we are each describing the steps involved. Lets plan on getting a set of options that work for the JSON gem and rails and then see about which other functions should be exposed. You might be pleased to know that in the C code the options are passed around and the globals are only visible for a merge with the call options at the start so adding a new-non-global dump call would be trivial.

stereobooster commented 7 years ago

What we are both talking about for rails is something that is not a simple on/off switch but some semi-intelligent approach

I wouldn't call it "intelligent" while it is a simple matrix of covered and non-covered cases. Let's switch to exact use cases, this will make more transparent what I mean.

1) I want most fast JSON implementation, I'm aware that speed comes with the tradeoff of incompatibility. Solution: use Oj directly, choose settings manually according to your requirements.

2) I'm using JSON in Ruby (non-rails) project. I want faster implementation. Solution: use Oj.mimic_json right after you required json first time. No more worries for you.

3) Same situation as 2, but I also have this kind of code:

class FancyClass
  def as_json
    {bla: "bla"}
  end
end

Solution: same as previous.

4) Same situation as 2, but I also have this kind of code:

class Hash
  def as_json
    {bla: "bla"}
  end
end

Solution: we do not support this case out of the box. Build your own compatibility mode (using Ruby code, I do not see a need for C code here), use our code as an example or do not use oj.

5) I have Rails project. I want a faster implementation of JSON. Solution: use Oj.mimic_rails right after you required rails the first time. No more worries for you.

6) Same situation as 5, but I also have this kind of code:

class FancyClass
  def as_json
    {bla: "bla"}
  end
end

Solution: same as previous.

7) Same situation as 5, but I also have this kind of code:

class Hash
  def as_json
    {bla: "bla"}
  end
end

Solution: we do not support this case out of the box. Build your own compatibility mode (using the ruby code, I do not see the need of C code here), use our code as an example or do not use oj.


Do I miss something in sense of implementation? Maybe there are some technical problems in sense of C implementation which I'm not aware of?

stereobooster commented 7 years ago

We will not guess what user wants, we will limit the number of use cases to most reasonable. For all non-standard use cases, a user can get support with his own small wrapper code. The same situation as with CRA, if you use it as intended you get all benefits and updates. If you chose to eject you are on your own.

ohler55 commented 7 years ago

I am all for just addressing the primary use cases. I believe use case 4 and 7 are built into the json gem and rails though as they both monkey patch Hash as well as others. Cases 1, 2, and 3 are covered but FancyClass work only when as_json is allowed. With the current implementation that also allows all other as_json methods which means Oj always calls the json gem monkey patches and is slow as your benchmarks show.

A little history on the to_json and as_json. Initially those were called if the object had that method. That broke with rails because rails would immediately call generate and a loop occurred. as_json would just return itself so now not only is there a use_as_json flag but Oj has to detect when the call to as_json returned itself and does not allow as_json the second time. It really is a mess. I'm getting off topic though.

ok, use cases.

  1. You got it. For encode/decode object mode is most efficient.
  2. The indent is mimic_JSON should be just like the JSON gem. With you compatibility tests that will happen.
  3. Supported as you mentioned since the JSON gem and mimic should call as_json.
  4. Hash has an as_json so it should be called already. There is some special handling for this in Oj so unless it is broken this should work as the JSON gem already does this. Fairly sure it does anyway. Which would make this the same as case 3. No change from out-of-the-box.
  5. Thanks to your compatibility tests it has become clear the json gem and rails are not the same so yes a rails mimic is needed. Assume that this will happen as a result of the rail-compat effort we are working on now.
  6. The simple answer would be yes, supported. The question this brings up is, is that what you really want? Having Oj call as_json on any object that implements that method will be as slow as the to_json calls. If you want something fast with rails then this use case and case 7 are the keys.
  7. No different than use case 6. Supporting as_json in case 6 means Hash.as_json as well.
ohler55 commented 7 years ago

I wish it were that simple. Oj was originally a replacement serializer that was used to encode object and save to a database. That is the current object mode. It gave us a huge speedup. That would be OJ out-of-the-box with no options. To have stopped there and force users to write ruby code to be json gem compatible would have meant Oj would not have gather the large user base it currently has.

I hear and understand what you are saying. I just don't know how to get Oj to satisfy the various use case and also provide the high performance. You are giving me good insight into were the problems are though. We keep coming back to how as_json in handled.

stereobooster commented 7 years ago

A little history on the to_json and as_json. Initially those were called if the object had that method. That broke with rails because rails would immediately call generate and a loop occurred.

I suppose you are talking about this code. My propose is to disable this code, with rails_mimc mode. Because we already will have compatible with rails output, we do not need it any more. As my benchmarks showed this code makes to_json slower and consume more memory even if Oj mimics JSON gem.

as_json would just return itself so now not only is there a use_as_json flag but Oj has to detect when the call to as_json returned itself and does not allow as_json the second time.

According to "as_json spec"

It SHOULD return a "meaningful" Ruby representation of the object.

e.g. you should not care what it returned, just encode it to json. No need to detect circular calls here.

ohler55 commented 7 years ago

Oj can be made to not call the as_json on the classes it handles directly but of course the use may intend to change the as_json method on some class such as Range (just an example). If Oj refuses to call as_json all the time that a problem since there is no work around. With an option the user can allow it to be called.

Unfortunately the as_json spec allows for conditions that encoders can not be reasonably expected to detect. An object X has an as_json method so it should be called. X returns itself so the encoder says, oh, look I have X and it has an as_json method. That creates a loop. The encoder has to be able to detect that and not call the as_json method again. It get worse when X returns Y and Y.as_json returns X. I have also seen X return a copy of itself so it is a different object.

Another case, X returns Y and the encoder does not call Y.as_json. Now the results no longer match what rails was expecting.

So we are back to the issue being when as_json should be called when the goal is being rails or json gem compatible.

stereobooster commented 7 years ago

I'm talking about mode which will call as_json only if this class is not special case which already handled by Oj (as_json_for_non_predefined). For example Range class:

class Range
  def as_json(options = nil) #:nodoc:
    to_s
  end
end

Other side note: we probably want rails compatibility layer as separate gem. So we can specify exact supported versions of Rails, while Oj itself can be developed and released independently. Otherwise every change in rails interface will require Oj change, plus support of old Oj version, which compatible with old rails versions, plus backporting fixes to that old versions.

ohler55 commented 7 years ago

I think we are in agreement on the mode. As for the gem, maybe not. In order to keep performance fast the as_json mimic code will need to be in C. It defeats the purpose to go from C to Ruby and back to C in a separate gem. You do bring up a good point though. Rails is a moving target so having some way to denote not only rails by the rails version will be important. I'm not too worried about having to update the code with new rails releases. That seems to be the case anyway as more fundamental changes occur with each rails release anyway. Oj will just have to deal with it.

I am part way through the code refactoring. The different mode are now, or will be, better encapsulated so it will be easier and cheap to create new modes. Lets go a little further in the compatibility changes and see how much the rails versions differ. If it is small maybe it can be dealt with without making another mode (my preference). Oj should be able to detect the rails version through some variable I hope.

ohler55 commented 7 years ago

I know rail-compat is broken right now. It is in the process of being moved to a v3.0 of Oj. The json compatibility tests should all pass now. I will be adding more unit tests for various options tomorrow.

Anyway the plan is to have like this: strict: similar or the same to the current strict mode null: same as current compat: aim for 100% JSON gem compatibility. This is much more limiting than today. For example, the JSON gem does not support as_json. mimic_JSON will need to be called for all the JSON gem functions to be present. rails: aim for 100% rail compatibility. The idea is to have a faster version of as many as_json calls as reasonable. custom: Make use of all the various options.

The compat and rails mode will not make use of any of the options other than the ones that match those in the json gem which are primarily for formatting.

This will be Oj 3.0 as the additional modes are new and the rails users will have to move to the rails mode making it a breaking change even though it is an easy migration by just changing the mode.

See what you started. :-)

stereobooster commented 7 years ago

To recap:

Oj. mimic_JSON will override JSON.parse, JSON.generate and some .to_json. Same time it will not affect Oj.load Oj.dump and Oj.default_options. JSON.parse and JSON.generate will work according to JSON spec and will not accept Oj params, only JSON params. After Oj. mimic_JSON everything JSON-related will work the same way as before (which is now not the case, see omitted tests which I committed).

Is that your plan?

ohler55 commented 7 years ago

That is the plan. There should be no omitted tests by the time we are done.

ohler55 commented 7 years ago

Additions tests are all passing now. Started on the common interface tests. Not sure about some of them such as making sure the JSON.parser name ends with Parser. Any option on those?