ruby / json

JSON implementation for Ruby
https://ruby.github.io/json
Other
704 stars 333 forks source link

Memory leakage since 2.4.0 #460

Open lizdeika opened 3 years ago

lizdeika commented 3 years ago

We have a couple of applications where postmark uses this gem. After updating bunch of gems we noticed a memory leak in sidekiq processes(we have a script that restarts sidekiq when memory is bloated). We had to revert all the recent gem updates and the problem was gone. Then we updated gems back one by one to find out which one caused the leakage. The problem returned back with json gem 2.4.0 update and reverting back to 2.3.1 was all good once again.

nitaliano commented 3 years ago

Seeing the same exact issue in the project i'm working on recently updated from 2.3.0 -> 2.5.1 and server memory is growing overtime

marcandre commented 3 years ago

@lizdeika and @nitaliano that's pretty serious, thanks for pinpointing to json and a precise version too.

If I'm not mistaken, there were really only two changes between 2.3.1 and 2.4: #405 and #447. I reread the diff and nothing jumps at me.

It would help to know: 1) Which Ruby version are you using? 2) Are you parsing JSON, generating JSON, or both? 3) Are you using, directly or via a dependency, the new freeze: true option when parsing, or the escape_slash: true when generating?

Question 3 could be partially answered by checking the options passed to JSON::Parser.new like this:

require 'json'
require 'set'
$json_options = Set.new

module OptionPeeker
  def initialize(source, **opts)
    if $json_options.add?(opts)
      p "New JSON parser options:", opts
    end
    super
  end
end

JSON::Parser.prepend OptionPeeker

(You may want to replace p with some other output mechanism)

Of course, a script we could run to reproduce the leak would be the best...

marcandre commented 3 years ago

Also, if either of you is using jruby, or the pure-ruby version of json, make sure to say it...

lizdeika commented 3 years ago

it was CRuby 2.6.5 compiled against jemalloc 5.2.1 sorry, no idea how postmark gem uses it unfortunately can not run the script as I have no access to the project already

nitaliano commented 3 years ago

Here are some extra details

  1. Ruby version 2.6.6
  2. Use Case: parsing and generating; I scanned through our code its its mostly using JSON.parse, JSON.dump, and JSON.pretty_generate. It looks like it is mostly being used for parsing HTTP request bodies, and generating JSON for logs which I am the most suspicious about.
  3. We are using it directly as a dep

We are also using sidekiq like @lizdeika. Hopefully next time I post back I'll have a repo to reproduce it 😄

marcandre commented 3 years ago

To better pinpoint the problem, I created 2 branches of json, starting from 2.3.1, by adding the feature #447 (branch "freeze_231") and then also adding #405, which I believe is equivalent to 2.4.1 for the C code (branch "equiv_241")

So if it was possible to check using:

gem 'json', git: 'https://github.com/marcandre/json.git', tag: 'freeze_231'
# and if that does not leak, then try with:
gem 'json', git: 'https://github.com/marcandre/json.git', tag: 'equiv_241'
luke-gru commented 1 year ago

I know this is an old issue, but I'm interested in it nonetheless because I noticed something in this gem that could cause issues.

Are the JSON keys you're parsing known ahead of time and bounded to a certain set of strings, or are they unbounded?

If they're unbounded, this commit: 1982070cb84a38793277f6359387938d80e4d2c4 introduces a memory issue by keeping all json keys in the ruby VM's frozen string table (where they're never released, as far as I know).

cc @byroot

byroot commented 1 year ago

(where they're never released, as far as I know).

That's incorrect. "fstrings" (AKA interned strings) are garbage collectable.

luke-gru commented 1 year ago

Oh okay right, I just took at look at rb_str_free(), carry on then 😄

luke-gru commented 1 year ago

This might be nothing too, but in the parser there's a call to ALLOC_N and it gets free'd with free instead of xfree, could that be a source of memory leak if ruby is built a certain way? I've heard that it's an antipattern but I don't know why.

byroot commented 1 year ago

I've heard that it's an antipattern but I don't know why.

AFAIK, it's mostly because that skip letting the GC know that some memory was freed, so GC might trigger earlier than it should but that's about it.

Would be nice to fix it though.

casperisfine commented 3 weeks ago

Alright, not sure if it's the same one, but I did find a leak:

require 'json'

data = 10_000.times.to_a << BasicObject.new
20.times do
  100.times do
    begin
      data.to_json
    rescue NoMethodError
    end
  end
  puts `ps -o rss= -p #{$$}`
end
 20128
 24992
 29920
 34672
 39600
 44336
 49136
 53936
 58816
 63616
 68416
 73232
 78032
 82896
 87696
 92528
 97408
102208
107008
111808

The various #to_json leak memory if an exception is raised during generation. Shouldn't be too hard to fix.

byroot commented 3 weeks ago

I checked the leak I just fixed predates 2.4.0, so I'll reopen, but without more info this issue isn't really actionable.

I think the only action I could see would be to setup ruby_memcheck and see if it catches something.

luke-gru commented 3 weeks ago

@casperisfine Funny enough I found a similar leak many years ago:

https://github.com/ruby/json/issues/251