Closed stereobooster closed 7 years ago
I like the benchmarks. I'm not sure I understand the table though. If it is a comparison to JSON how can JSON fail on some types?
When running mimic_JSON it sets up the defaults. I thought I had them correct but maybe not. I'll check against your settings.
I'll take a look at your tests as well to see why you are having issues with DateTime and ActiveModel::Errors. DateTime in particular seems odd as there are several unit test for that type.
Comparing to Rails to_json
which is not the same as JSON.generate
Ah, got it, thanks. I followed reference you provided on some other page to look at rails as_json(). I noticed it too is deprecated but there is no suggested alternative. Do you happen to know what will replace it?
I haven't found official info yet. Maybe this one active_support/json/encoding.rb
Update:
require 'oj/active_support_helper'
.escape_mode: :xss_safe
test for StringChinese: 'δΊθ‘',
fails.And found some unsupported classes (at least some of them doesn't work)
Complex: Complex('0.3-0.5i'),
Exception: Exception.new,
OpenStruct: OpenStruct.new(:country => "Australia", :population => 20_000_000),
Rational: Rational(0.3),
Usually I would use mimic_JSON but in this case you are comparing JSON to OJ so that doesn't work. I have a few suggestions and there are some changes that need to be made to Oj.
You can get rid of the comparison with :object mode. It was never intended to be anything like the JSON gem. It is for efficient encoding and decoding. By the way, the DateTime worked fine for me here with your code with the datetime key compare removed.
Next the options should be like this:
OJ_2 = { mode: :compat, use_as_json: false, float_precision: 16, bigdecimal_as_decimal: false, nan: :null, time_format: :xmlschema, second_precision: 3 }
OJ_3 = { mode: :compat, use_as_json: true, float_precision: 16, bigdecimal_as_decimal: false, nan: :null, time_format: :xmlschema, second_precision: 3 }
My first set of fixes for OJ are these but I will get to the others and figure out how to deal with the ActiveSupport as well. Maybe the as_json is th eonly way for those though.
- round off of date when sec prec should be round, not cut off
- for DateTime.new should include fraction
- BigDecimal infinity should follow :nan option
There should be a new branch called
We might be using a different version of active support. I just installed the latest set. Using the oj/active_support_helper adds the datetime creation function so it might be replacing the existing one. Lets no do that for now.
if escape_modes fails when set to :xss_safe let not do that. :-)
If you keep your compatibility test up to date I'll keep pulling and we can work toward a 100% match on compat mode.
I updated test results
I'll keep updating the rail-compat branch. Should be able to get more positive results over the next few days.
I narrowed down problem with DateTime
if Rails::VERSION::MAJOR == 5 && RUBY_VERSION == "2.2.6"
# see https://github.com/ohler55/oj/commit/050b4c70836394cffd96b63388ff0dedb8ed3558
require 'oj/active_support_helper'
end
huh... I got an idea if we could get to 100% compatibility with rails, we can patch .to_json
after rails again.
Make .to_json
fast again
The tricky part is going to be the active support types I think. Maybe expanding the active_support_helper will take care of that though. Working together to hit 100% would be great.
From my POV real stoppers are
DateTime
. I can reproduce it. So maybe I can use DTrace to get more informationescape_modes
. If I set it to true, tests for multibyte strings fail, if I set it to false test for escape string fails.Plus there are potential incompatibilities with JSON gem, which is also support (got list from here):
But this can be tracked in other ticket.
Tracking here is fine. I'll be using this to get differences resolved. I have a day job so will work on it this evening. PST time.
Regexp now compatible and I added a new :unicode_xss escape mode to allow unicode but < > and some others like \u2028 are escaped.
Started on Enumerable. I discovered to_json ends up calling as_json and then drops back into Oj to encode. I'm not sure it makes sense to try and intercept that and do a direct encoding. If to_json uses as_json then maybe that is the correct approach for some of the more unusual types. What are your thoughts?
If to_json uses as_json then maybe that is the correct approach for some of the more unusual types. What are your thoughts?
This seems to be a reasonable approach here. And also it is pretty good in benchmarks. Feel a bit guilt, because you are doing all heavy lifting here. Will try to do something useful too.
You provide the knowledge of rails. Something I am lacking.
I assume purpose of this set of changes is to keep rails as fast as possible and thats why you have been trying to avoid as_json. Is that correct? If so it sounds like the approach should be use Oj first and then revert to as_json for the more complete objects. We have to figure out which category each type falls into.
I assume purpose of this set of changes is to keep rails as fast as possible
yes
and thats why you have been trying to avoid as_json
no. Thats why I was trying to avoid Rails implementation of to_json
.
In rails project to_json
is slower and consumes more memory than JSON.generate
even if I use Oj.mimic_JSON
. See:
And I found reason for this:
I want achieve total compatibility and remove Rails magic, then it will be real improvement
I imagine something like this
def mimic_rails
require 'oj'
if Rails::VERSION::MAJOR == 5 && RUBY_VERSION == "2.2.6"
# see https://github.com/ohler55/oj/commit/050b4c70836394cffd96b63388ff0dedb8ed3558
require 'oj/active_support_helper'
end
Oj.default_options = {
float_precision: 16,
bigdecimal_as_decimal: false,
nan: :null,
time_format: :xmlschema,
second_precision: 3,
escape_mode: :unicode_xss,
mode: :compat,
use_as_json: true,
}
Oj.mimic_JSON()
begin
require 'json'
rescue Exception
end
module ActiveSupport
module ToJsonWithActiveSupportEncoder # :nodoc:
def to_json(options = nil)
super(options)
end
end
end
end
Something I am a little surprised with is that Oj with as_json is the faster of the 3 Oj combinations. I wonder if that is due to the exceptions.
Anyway with the goal in mind of making rails faster maybe you can help with some temporarily modified tests. If the types that raise an exception are commented out, is straight compat without as_json faster than with as_json? What that will tell us is whether it is best to focus on eliminating as_json calls or to continue with them.
Assuming calls to as_json are expensive then an approach that avoid them when possible can be take. Maybe even a special class handler for the common active support types.
Nevermind on the request. I see in benchmark2.rb the two type that raised exceptions were removed.
My results with benchmark2.rb were slightly different on a Ubuntu machine with the rails-compat branch.
user system total real
to_json: 1.620000 0.000000 1.620000 ( 1.619461)
Oj.dump o 0.150000 0.000000 0.150000 ( 0.144752)
Oj.dump c 0.290000 0.000000 0.290000 ( 0.294706)
Oj.dump c, aj 0.240000 0.000000 0.240000 ( 0.235439)
I pushed to compatibility test repo with rails-compat
branch. For some reasons can't run benchmarks. Important note: I'm talking about speed, but it is actually memory and allocations initial issue for me. For example if you are using something like logstash-logger, which converts logs to JSON to send them, and you do a lot of logging you will end up with huge number of allocations and objects trashing.
Related: https://github.com/dwbutler/logstash-logger/pull/118
Interesting you mention using JSON for log entries. You might be interested in my piper-ruby gem, https://github.com/ohler55/piper-ruby. Super simple. Coupled with NATs for publishing and Piper for you get log files and the start of an ops console.
Found what is the problem with Rational
. Oj
in "rails compat" mode returns "\"5404319552844595\\/18014398509481984\""
while rails code doesn't escape /
. It escapes only \u2028\u2029><&
.
That is unfortunate since the JSON spec calls for it to be escaped. Maybe a note indicated that mode may generate invalid JSON.
In my tests .to_json
and JSON.generate
both generate same "wrong" JSON. Added StringSpecial2
("\\/"
) to compatibility tests.
I can not insist to use this option by default, because there is JSON standard, but it would be nice at least have an option to change this
Okay, today's updates were for Rational, Struct, Enumerable, and the "\/" change for the unicode_xss escape mode.
I also figured out why as_json was faster than just compat. On some of the active types all the attributes were dumped which was more expensive than just the desired ones. That means avoiding as_json (or to_json) is the way to go. I'm going after the non-active ones first and will then decide how best to handle the active ones. Maybe if the ActiveXXX module then also do an as_json for that only. Might need a different rails mode or something.
With the active type commented out along with other non-passing I get this:
user system total real
to_json: 1.290000 0.000000 1.290000 ( 1.286998)
Oj.dump o 0.100000 0.000000 0.100000 ( 0.099203)
Oj.dump c 0.080000 0.000000 0.080000 ( 0.084050)
Oj.dump c, aj 0.150000 0.000000 0.150000 ( 0.156359)
I haven't updated to use your latest tests since you mentioned the benchmarks were not working. If it working again let me know and I will do a pull.
I'm thinking of some hybrid mode, which will use Oj built in serialisation for known objects, like Regexp, Range etc. and will fallback to as_json
otherwise.
Before adding even more options lets count what we have now:
Parsing: 4*4*2*2*2*2*2*2*2 * 2*2*4*2*2*5 = 655360
Serialization: 4*4*2*2*2*2*2*2*2 * 2*2*2*3*2*2*2*2 = 786432
For all non sets values I substituted 2 (for Numeric or Class params). Not every combination makes sense.
This is just side-note. This will not stop us from doing Rails compatible mode and add good test coverage for it
Yup, there are a lot of options. They grow over time as people ask for various features.
Back to the hybrid mode. I was thinking along similar lines but something like this in combat mode.
Note only the active ones will have as_json called. If non-active types should have as_json used then turn on the use_as_json option. This should keep compatibility with previous versions and non-rails use yet give a huge performance boost to rails. I am shooting for 10x over to_json will the full test types.
Range is now rails compatible.
Your tests made it very clear that to_json is not the same as rails. Maybe compat mode should be for to_json and a new rails mode for rails compatibility. Is that your understanding as well?
This is just meditation on the subject, do not take it as final judgment or critic.
Having global settings lowers portability. If two different libs assume different global settings, you won't be able to use them without modification.
Oj.dump(value)
(or Oj.load
) should always return the same value in all environments until you use the same version of a gem. Portability not guaranteed across different versions of gems.
If users want to have global settings, they can easily create wrapper class, wich will expose default settings behavior.
class OjGlobal
class << self
attr_accessor :default_options
end
def dump(value, options = {})
Oj.dump(value, self.class.default_options.merge(options))
end
def load(values, options = {})
Oj.load(value, self.class.default_options.merge(options))
end
end
Oj.mimic_JSON
will patch JSON
object, and all objects which are patched by json
gem.
JSON.parse
(or JSON.generate
) will behave the same way before and after the patch, e.g. it will produce same results, and accept same options, which are acceptable by json
gem.
Same time Oj.dump
doesn't have to behave the same way as JSON.generate
before or after Oj.mimic_JSON
call.
There is also will be Oj.mimic_rails
, which will internally call Oj.mimic_JSON
if it hasn't been called before and will patch the same set of objects which is patched by Rails and will remove Rails magic to improve speed.
After Oj.mimic_rails
call .to_json
(or JSON.generate
) will behave the same way as before.
If some users want to change the default behavior of JSON
or .to_json
they can easily do it with built in Ruby facilities themselves.
By removing global settings we will simplify tests. We will need to test default mode, test compatibility for Oj.mimic_JSON
and Oj.mimic_rails
and simple unit tests for all possible combinations of second argument for Oj.dump
and Oj.load
.
Plus we have TravisCI setup with a matrix of all possible combinations of Rails and ruby. This will guarantee our users that they can safely plug Oj
in covered Rails and ruby versions combinations without fear and get an instant speed boost and memory improvement.
What do you think?
I think if Oj was not trying to be compatible with the json gem and rails your approach make complete sense and Oj. actually started out without global options. As soon as rails support came into the picture people demanded some way of have setting globally so they did not have to either call with those setting every time or wrap every write wrappers which rails would then ignore. Right now moving to that model would be a major change in the API and I know it would break a lot of code out there.
Having said that you might prefer another direction I was thinking about. I was not planning on a mimic_rails but instead was thinking of coming up with a set of options that would make Oj rails compatible. By providing those in a function call like give_me_the_rails_options() that could be used as an argument to dump or generate as your describe or could be used as globals. That would leave the existing APIs in place with no change other than the addition of calls to get pre-defined option sets.
I'm afraid globals aren't going away but deprecating their use in favor of options passed in on each call might be a way forward. Comments?
Anyway, with the addition of yet another mode, i.e., rails I am going to take a day or two to refactor the dump code a bit. It will be cleaner and maybe even slightly faster.
I think if Oj was not trying to be compatible with the json gem
Generally yes. If you use standalone Oj, than it has nothing to do with JSON
gem.
But if we talk about mimic_JSON
, then it makes sense to think about compatibility. Not sure what is the point to replace existing interface with something incompatible. Yes existing interface maybe dumb, wrong, irrational or anything else. But this is how its done. We have versions of existing interface (ruby version), and we can track compatibility with different versions by stating ruby version in Gemspec. Anything wrong about current implementation of JSON
gem can be reported as bug in ruby bugtracker.
You can think of mimic_JSON
as separate layer (gem), which will bring compatibility with JSON gem, while Oj will be pure library with settings, but without globals.
I'm afraid globals aren't going away but deprecating their use in favor of options passed in on each call might be a way forward.
I'm generally not proposing to break everything right away. We can, for example, expose OjCore
which will be like Oj
but without globals, same time Oj
will become wrapper for OjCore
. Oj
will have same interface as now plus deprecation notice for globals.
OjCore
will be reused in compatibility modes for JSON
and rails
, but main idea here is that global settings of Oj
will not influence compatibility modes. I guess this is breaking changes. But I'm ready to hear arguments, why somebody decided break compatibility in the first place. I mean you don't do things like this:
class Object
def to_json
"bla bla"
end
end
Why would you chose to replace JSON methods with something incompatible.
My intention is to have: compatible layer (so you do not need to worry, if it works or not), reliable (having globals make situation worse), fully tested and transparent in sense of supported versions (of ruby and rails).
Note on as_json
.
This proposal says:
JSON.parse (or JSON.generate) will behave the same way before and after the patch
After it will behave as it is expected to behave before. E.g. if a user will redefine as_json
function for some built-in class, like Hash or Array, this will be not covered by Oj
. But I do not see any problems to use as_json
for everything else what is not built-in. Or I miss something and this will be huge performance drop even for built-in objects?
Nice xkcd comic.
Actually, the use of as_json or to_json is a significant performance hit. The tricky part is knowing when the user wants as_json called and when they don't. The user can add as_json to any class as evident with rails and active. The options use_to_json and use_as_json exist to provide a simple on/off for using those methods. What we are both talking about for rails is something that is not a simple on/off switch but some semi-intelligent approach to only using as_json when we don't want to mimic rails son encoding in Oj. Of course that means we will miss any user defined as_json unless the use_as_json switch is through or some kind of indicator is attached to the user class or in a global registry table. (a global table is faster and could be built from some indicator on the class)
I think we are agreeing on the direction Oj should move. The details of how to get there may differ a little but I think a lot of that is the way we are each describing the steps involved. Lets plan on getting a set of options that work for the JSON gem and rails and then see about which other functions should be exposed. You might be pleased to know that in the C code the options are passed around and the globals are only visible for a merge with the call options at the start so adding a new-non-global dump call would be trivial.
What we are both talking about for rails is something that is not a simple on/off switch but some semi-intelligent approach
I wouldn't call it "intelligent" while it is a simple matrix of covered and non-covered cases. Let's switch to exact use cases, this will make more transparent what I mean.
1) I want most fast JSON implementation, I'm aware that speed comes with the tradeoff of incompatibility. Solution: use Oj
directly, choose settings manually according to your requirements.
2) I'm using JSON
in Ruby (non-rails) project. I want faster implementation. Solution: use Oj.mimic_json
right after you required json first time. No more worries for you.
3) Same situation as 2, but I also have this kind of code:
class FancyClass
def as_json
{bla: "bla"}
end
end
Solution: same as previous.
4) Same situation as 2, but I also have this kind of code:
class Hash
def as_json
{bla: "bla"}
end
end
Solution: we do not support this case out of the box. Build your own compatibility mode (using Ruby code, I do not see a need for C code here), use our code as an example or do not use oj
.
5) I have Rails project. I want a faster implementation of JSON. Solution: use Oj.mimic_rails
right after you required rails the first time. No more worries for you.
6) Same situation as 5, but I also have this kind of code:
class FancyClass
def as_json
{bla: "bla"}
end
end
Solution: same as previous.
7) Same situation as 5, but I also have this kind of code:
class Hash
def as_json
{bla: "bla"}
end
end
Solution: we do not support this case out of the box. Build your own compatibility mode (using the ruby code, I do not see the need of C code here), use our code as an example or do not use oj
.
Do I miss something in sense of implementation? Maybe there are some technical problems in sense of C implementation which I'm not aware of?
We will not guess what user wants, we will limit the number of use cases to most reasonable. For all non-standard use cases, a user can get support with his own small wrapper code. The same situation as with CRA, if you use it as intended you get all benefits and updates. If you chose to eject
you are on your own.
I am all for just addressing the primary use cases. I believe use case 4 and 7 are built into the json gem and rails though as they both monkey patch Hash as well as others. Cases 1, 2, and 3 are covered but FancyClass work only when as_json is allowed. With the current implementation that also allows all other as_json methods which means Oj always calls the json gem monkey patches and is slow as your benchmarks show.
A little history on the to_json and as_json. Initially those were called if the object had that method. That broke with rails because rails would immediately call generate and a loop occurred. as_json would just return itself so now not only is there a use_as_json flag but Oj has to detect when the call to as_json returned itself and does not allow as_json the second time. It really is a mess. I'm getting off topic though.
ok, use cases.
I wish it were that simple. Oj was originally a replacement serializer that was used to encode object and save to a database. That is the current object mode. It gave us a huge speedup. That would be OJ out-of-the-box with no options. To have stopped there and force users to write ruby code to be json gem compatible would have meant Oj would not have gather the large user base it currently has.
I hear and understand what you are saying. I just don't know how to get Oj to satisfy the various use case and also provide the high performance. You are giving me good insight into were the problems are though. We keep coming back to how as_json in handled.
A little history on the to_json and as_json. Initially those were called if the object had that method. That broke with rails because rails would immediately call generate and a loop occurred.
I suppose you are talking about this code. My propose is to disable this code, with rails_mimc mode. Because we already will have compatible with rails output, we do not need it any more. As my benchmarks showed this code makes to_json
slower and consume more memory even if Oj mimics JSON gem.
as_json would just return itself so now not only is there a use_as_json flag but Oj has to detect when the call to as_json returned itself and does not allow as_json the second time.
According to "as_json spec"
It SHOULD return a "meaningful" Ruby representation of the object.
e.g. you should not care what it returned, just encode it to json. No need to detect circular calls here.
Oj can be made to not call the as_json on the classes it handles directly but of course the use may intend to change the as_json method on some class such as Range (just an example). If Oj refuses to call as_json all the time that a problem since there is no work around. With an option the user can allow it to be called.
Unfortunately the as_json spec allows for conditions that encoders can not be reasonably expected to detect. An object X has an as_json method so it should be called. X returns itself so the encoder says, oh, look I have X and it has an as_json method. That creates a loop. The encoder has to be able to detect that and not call the as_json method again. It get worse when X returns Y and Y.as_json returns X. I have also seen X return a copy of itself so it is a different object.
Another case, X returns Y and the encoder does not call Y.as_json. Now the results no longer match what rails was expecting.
So we are back to the issue being when as_json should be called when the goal is being rails or json gem compatible.
I'm talking about mode which will call as_json
only if this class is not special case which already handled by Oj
(as_json_for_non_predefined
). For example Range
class:
Oj
class Range
def as_json(options = nil) #:nodoc:
to_s
end
end
as_json
for Range
as_json
for Range
class, it means he doesn't want "rails compatible" mode, because he just broke rails specification. He can use as_json: true
or we can call it as_json_for_all
, with performance dropas_json
for this class.Other side note: we probably want rails compatibility layer as separate gem. So we can specify exact supported versions of Rails, while Oj
itself can be developed and released independently. Otherwise every change in rails interface will require Oj change, plus support of old Oj version, which compatible with old rails versions, plus backporting fixes to that old versions.
I think we are in agreement on the mode. As for the gem, maybe not. In order to keep performance fast the as_json mimic code will need to be in C. It defeats the purpose to go from C to Ruby and back to C in a separate gem. You do bring up a good point though. Rails is a moving target so having some way to denote not only rails by the rails version will be important. I'm not too worried about having to update the code with new rails releases. That seems to be the case anyway as more fundamental changes occur with each rails release anyway. Oj will just have to deal with it.
I am part way through the code refactoring. The different mode are now, or will be, better encapsulated so it will be easier and cheap to create new modes. Lets go a little further in the compatibility changes and see how much the rails versions differ. If it is small maybe it can be dealt with without making another mode (my preference). Oj should be able to detect the rails version through some variable I hope.
I know rail-compat is broken right now. It is in the process of being moved to a v3.0 of Oj. The json compatibility tests should all pass now. I will be adding more unit tests for various options tomorrow.
Anyway the plan is to have like this: strict: similar or the same to the current strict mode null: same as current compat: aim for 100% JSON gem compatibility. This is much more limiting than today. For example, the JSON gem does not support as_json. mimic_JSON will need to be called for all the JSON gem functions to be present. rails: aim for 100% rail compatibility. The idea is to have a faster version of as many as_json calls as reasonable. custom: Make use of all the various options.
The compat and rails mode will not make use of any of the options other than the ones that match those in the json gem which are primarily for formatting.
This will be Oj 3.0 as the additional modes are new and the rails users will have to move to the rails mode making it a breaking change even though it is an easy migration by just changing the mode.
See what you started. :-)
To recap:
Oj. mimic_JSON
will override JSON.parse
, JSON.generate
and some .to_json
. Same time it will not affect Oj.load
Oj.dump
and Oj.default_options
. JSON.parse
and JSON.generate
will work according to JSON
spec and will not accept Oj
params, only JSON
params. After Oj. mimic_JSON
everything JSON
-related will work the same way as before (which is now not the case, see omitted tests which I committed).
Is that your plan?
That is the plan. There should be no omitted tests by the time we are done.
Additions tests are all passing now. Started on the common interface tests. Not sure about some of them such as making sure the JSON.parser name ends with Parser. Any option on those?
It took me some time to pickup all settings. But I was able to setup it. And it is most compatible thing with Rails
to_json
, and much faster than Railsto_json
. Thanks a lot for this gem.My repo for benchmarks. During my tests found some bugs. Will report here:
DateTime
inmode: : object
, actually it takes forever, haven't seen the error itselfActiveModel::Errors
inmode: : object
andmode: : compat
.Oj
doesn't pickup right value forfloat_precision
. Need to set it manualy.Questions:
Suggestions:
{ mode: :compat, use_as_json: true, float_precision: 16, bigdecimal_as_decimal: false, nan: :null }
Other notes: my OS is macOS Sierra.