Ruby objects in v8 are held as weak

sergeych commented 10 years ago

I have a proxy class that add to script context like:

    cxt['User']    = AppUser::UserProxy

And then create instances in code with new User(). When the load is not minimal (say, hundred of objects or so), my UserProxy instances are GC'd and I receive random errors from "Invalid Reference - probably recycled" to completely strange stuff. Ruby 2.1.3. It could be evaded by keeping strong reference somewhere, say, in constructor:

def initialize source=nil
  @@keeper << self
  # ....

But this is very ineffective solution as objects will persist and could not be recycled until the script is done. In my high-load targets it is hardly acceptable. Could you please make such references strong? After all, keeping weak reference to ruby "parent" object from live v8 object is a bug.

cowboyd commented 10 years ago

JS code does keep strong references to Ruby objects. The weak references are from the Ruby objects back to their corresponding JS objects. If you think about this you will see that this is necessary in order to keep JS->Ruby->JS cycles from preventing garbage collection.

In your example there is a javascript object User which has a strong reference to AppUser::UserProxy

if you call new User() it will create a _strong_reference to an instance of AppUser::UserProxy. If it is not creating a strong reference, then this is definitely a bug since this is not the expected behavior.

Can you submit a test case that reproduces this? That would help very much in debugging.

In the past there have been bugs with Ruby's weakref implementation

sergeych commented 10 years ago

My test show that it is not. My test script that fails is

  users = (User.find i for i in ids)
  debug users.length
  debug users[users.length-1]
  (debug x.displayName for x in users)
  push.ios( users, {foo: 'bar'} )

the push.ios calls ruby method which enumerates users in array and imitates sending pushes - and fails with "Invalid Reference - probably recycled" when calling user.inspect. If only happens when creating many users. Keeping another reference to created user ruby objects fixes the situation - from all this I suppose that either JS::User -> Ruby::UserProxy is either weak or there is a bug. From my experience "Invalid Reference - probably recycled" is a Ruby's message often caused by using the weak reference. And yes, it could be with ruby but - why in this case the reference is not strong?

The project is huge so I would try to extract small test case for this. So far have to use this ugly fix with keeping extra reference to each and every ruby object :(

and thank you - bindings are elegant and nice!

cowboyd commented 10 years ago

This may very well be a bug. It should be keeping a strong reference. That is the intended behavior. In other words, as long as the JavaScript object that represents it is in scope in your JavaScript context, then it should hold onto it, even if there are no references from ruby.

Can you share your test script in a form that you can share publicly?

sergeych commented 10 years ago

Here we go. The simplest test:

require 'v8'

puts "Rv: #{RUBY_VERSION}"

class UserProxy

  attr_accessor :name

  @@keeper = []

  def initialize
    @name = '--no--'
    # Comment next line to fail:
    @@keeper << self
  end

  def to_s
    "User(#{name})"
  end
end

class Env

  def debug *args
  end

  def test
    script = <<-END
      var users = new Array();
      for(var i=0; i<1000; i++) {
        u = new User();
        u.name = '#'+i;
        users.push(u);
      }
      debug("First: "+users[0].to_s);
    END

    cxt        = V8::Context.new timeout: 7000
    cxt[:User] = UserProxy
    cxt[:debug] = -> (me, *args) {
        puts args.map(&:to_s) * ' '
    }
    cxt.eval(script)
  end
end

Env.new.test

Notice line #14 - it's a hack I'm using to retain referenced ruby object. If it is not commented, it works as intended:

Rv: 2.1.2
First: User(#0)

but when I comment it out it immediately fails: http://pastebin.com/ey4HFJ87

I think that it is the same bug that fails to keep strong reference to the inner object.

It might be connected also: if you pass any argument to ruby inner class initialize() it comes wrong. The string will not be converted to the ruby string, and integer will cause failure on my setup.

Does it help?

sergeych commented 10 years ago

I should admit that because of many random bugs of this type under load the version is not usable at all in production. Pity on that :( but we have to either fix it or get rid of it. Gem works only as long as a load is low (lot of free memory and short scripts). In the multithreaded environment it fails.

sergeych commented 10 years ago

Confirmed the general weak referencing of ruby objects from your gem. Even cxt[:debug]= -> {} may cause failure in multithreaded environment with heavy load. Looks like most ruby objects you reference/use are not retained.

The workaround is disabling GC before creating V8::Context until the context is wiped. It works but limits script usage to short and simple tasks. Pity on that :(

cowboyd commented 10 years ago

All ruby values that are held by V8 are held via external references:

https://github.com/cowboyd/therubyracer/blob/master/ext/v8/external.cc#L36-L42

Obviously, there is something more at play here, but I don't have the time to find out exactly what's going on.... if you comment line https://github.com/cowboyd/therubyracer/blob/master/ext/v8/external.cc#L41

Does that solve your problem? If so, I think I may know what's up.

sergeych commented 10 years ago

I’m afraid mark and sweep GC does not work the way you have expected. rb_gc_register_address() does not mark object as referenced - there is no reference count anyway. Instead, you should provide mark() function to our “root” object (e.g. V8::Context or like) which, when called, should call rb_gc_mark() on all referenced ruby objects. All of them! Everything you have added with ctx[]= function, all ruby objects that were created by the script and referenced by alive javascript objects and so on. If you fail to mark any of the referenced objects GC will likely reclaim any of the above mentioned objects that will cause segfaults or extremely strange behavior (I observed a lot in my production).

So, you have to keep something like double list or like with all “live” objects you reference from script context, and rb_gc_mark() ‘em all on request.

There is also simpler but ugly solution: you add ruby hash (important: it must be as a @member or @@class_member or like) to your “root” class/instance and add/remove all active ruby objects to it - then ruby will mark them for you.

ps. I’m not sure what rb_gc_register_address() does (and likely nobody who does not know Japanese knows) but likely it was intended to mark static roots to your extension ‘static’ objects...

cowboyd commented 10 years ago

I understand that this is not working as expected, and I understand that we want to hang on to all objects that are embedded in the JavaScript context. Getting it right is going to be hard (as we can already see)

It had been my understanding that I could add arbitrary VALUEs to the root set via rb_gc_register_address, but googling around a bit on the mailing lists and stackoverflow seems to support the idea that rb_gc_register_address is only suitable for registering static addresses, and not dynamic ones. Thank you for pointing this out.

This means that we will somehow have to add the object to some ruby collection so that it will not be gc'd, or that we add it in C to some thread safe collection that is marked with every GC.

The problem is that we want the ruby object to be GC'd when it is no longer referenceable from javascript, so when the JS object is GC'd, we need to remove it from this global "live list" This is a pain if the live list is a ruby object like a hash because at the point when we need to register this object with the "root" object, we are in v8 code, and do not have the Ruby GIL.

One solution: when the object is first sent to JS, we could add it to a live list, and then when it is GC'd from JavaScript, we could enqueue it in a threadsafe queue to be unregistered in the next ruby gc. There is already a similar process that happens when JavaScript objects are released from Ruby.

sergeych commented 10 years ago

Yes, the best solution is to have thread safe fast doublie-linked c++ list or set (from now on, 'retained_objects') where put all ruby object referenced by the context in any way - e.g. anything that was added to context from ruby side and everything that was produced by the js.

Then, In our top-level ruby object (most likely V8::context) we should provide mark function, that should call rb_gc_mark() on all objects in out retained_object list.

When the JS GC's some objects with references to ruby objects, we should just remove corresponding ruby objects from retained_objects - that’s all, we do not need neither acquire GIL nor somehow “unmark” them. Simply remove from the retained_objects, and next time ruby GC passes, these objects will not be marked anymore and therefore will be reclaimed.

Sounds easy enough - except that every time everywhere in your code where you create or reference any ruby object - you should add to to the list and remove when you don not need it. I mean, Whenever you create ruby object (say, rb_new_str()), and keep reference to it, add it to the retained_objects.

sergeych commented 10 years ago

We still try to use gem in profuction - disabling GC and limiting concurrency makes error rate critical but no catastrophic - so I would like to help to fix it. Do you have plans to fix this soon? I would love to help, but I could not find (in reasonable time) all the places where we should keep references and all places where we should release them, so all I can effectively do is a ruby GC's mark part (which is, like the thread-safe collection, rather trivial). Otherwise, if you haven't intention to fix in in near future, I have to get rid of javascript ASAP - though it is pain as a big part is already written in coffeescript and it is a feature we like.

cowboyd commented 10 years ago

@sergeych I am actively working on this. I'm targeting the latest version of v8 so that I don't write obsolete code from day zero, and running into some irritating roadblocks. I cannot give you a time frame, but I will do my best to spend as much time on this in the following weeks.

sergeych commented 10 years ago

Nice! You do the right thing with new version of v8 - the only way to make it work well with ruby's GC is (imho) Persistent<>'s weak reference callback which gives a point where the linked ruby object could be reclaimed (no more marked). And the v8 version that was used lacks it. Also, I see no way how you can use ruby's weak references - from what I see, you should always use strong references as long as the corresponding v8 objects are alive, otherwise with long running scripts it will fails like now. And thank you in advance!

cowboyd commented 10 years ago

I've pushed the branch https://github.com/cowboyd/therubyracer/tree/manually-retain-ruby-objects which may help you out. It manually retains and releases the ruby objects using a std::map.

This is very experimental and it will leak memory (the entry in the map will be leaked after the ruby object is GC'd) but it will leak way less than if you had retained the ruby object altogether. Anyway, I wanted to push what was available, so you can try using it off of that branch. If that helps, then we can explore further.

See relevant commit here 9a5e1d06c7e423b620194d06b2da2240f55bf049

sergeych commented 10 years ago

Thanks! Will try!

By the way, why not to free GC::Queue in the eval() after the GIL is reacquired? I think it is safe to clear the list at this point. This will limit memory leaks to the script running which is in most scenarios acceptable. And all this should work fine assuming that all ruby objects you create, obtain or otherwise deal with outside local scope under GIL are kept in this queue.

Il giorno 08/nov/2014, alle ore 05:42, Charles Lowell notifications@github.com ha scritto:

I've pushed the branch https://github.com/cowboyd/therubyracer/tree/manually-retain-ruby-objects which may help you out. It manually retains and releases the ruby objects using a std::map.

This is very experimental and it will leak memory (the entry in the map will be leaked after the ruby object is GC'd) but it will leak way less than if you had retained the ruby object altogether. Anyway, I wanted to push what was available, so you can try using it off of that branch. If that helps, then we can explore further.

— Reply to this email directly or view it on GitHub.

mtparet commented 10 years ago

Having the same issue using https://github.com/cowboyd/handlebars.rb in sidekiq workers.

cowboyd commented 10 years ago

@mtparet did that experimental branch work for you?

mtparet commented 10 years ago

@cowboyd Still getting the error:

2014-11-10T19:06:15.933Z 11471 TID-11fwds WARN: undefined method `Exit' for nil:NilClass
2014-11-10T19:06:15.933Z 11471 TID-11fwds WARN: /home/matt/.rvm/gems/ruby-2.1.3/bundler/gems/therubyracer-9a5e1d06c7e4/lib/v8/context.rb:250:in `ensure in block (2 levels) in lock_scope_and_enter'
/home/matt/.rvm/gems/ruby-2.1.3/bundler/gems/therubyracer-9a5e1d06c7e4/lib/v8/context.rb:250:in `block (2 levels) in lock_scope_and_enter'
/home/matt/.rvm/gems/ruby-2.1.3/bundler/gems/therubyracer-9a5e1d06c7e4/lib/v8/context.rb:245:in `call'
/home/matt/.rvm/gems/ruby-2.1.3/bundler/gems/therubyracer-9a5e1d06c7e4/lib/v8/context.rb:245:in `HandleScope'
/home/matt/.rvm/gems/ruby-2.1.3/bundler/gems/therubyracer-9a5e1d06c7e4/lib/v8/context.rb:245:in `block in lock_scope_and_enter'
/home/matt/.rvm/gems/ruby-2.1.3/bundler/gems/therubyracer-9a5e1d06c7e4/lib/v8/context.rb:244:in `call'
/home/matt/.rvm/gems/ruby-2.1.3/bundler/gems/therubyracer-9a5e1d06c7e4/lib/v8/context.rb:244:in `Locker'
/home/matt/.rvm/gems/ruby-2.1.3/bundler/gems/therubyracer-9a5e1d06c7e4/lib/v8/context.rb:244:in `lock_scope_and_enter'
/home/matt/.rvm/gems/ruby-2.1.3/bundler/gems/therubyracer-9a5e1d06c7e4/lib/v8/context.rb:204:in `enter'
/home/matt/.rvm/gems/ruby-2.1.3/bundler/gems/therubyracer-9a5e1d06c7e4/lib/v8/context.rb:94:in `eval'
/home/matt/.rvm/gems/ruby-2.1.3/bundler/gems/therubyracer-9a5e1d06c7e4/lib/v8/context.rb:231:in `block in load'
/home/matt/.rvm/gems/ruby-2.1.3/bundler/gems/therubyracer-9a5e1d06c7e4/lib/v8/context.rb:230:in `open'
/home/matt/.rvm/gems/ruby-2.1.3/bundler/gems/therubyracer-9a5e1d06c7e4/lib/v8/context.rb:230:in `load'

Not sure if I should open another issue or if this is really linked to this issue. linked to https://github.com/cowboyd/therubyracer/issues/206 infact

sergeych commented 10 years ago

Yep. I’m getting it all the time. I suppose it may be collection to the same error - ruby object is lost by the time of call.

sergeych commented 10 years ago

I undertake some measures to keep my code from segfaults:

Forbid ruby GC from Context creation (important!) untul last eval() is done
Forced single threaded environment as much as possible (what causes me great degrade of productivity)
Keep a Hash with all ruby object I created from JS

But I do not keep references to context initializations like ctx[:debug] = ... - think that these functors among all other ruby objects created in them are still occasionally collected...

With all these, I'm still getting errors (now only with long-running and/oe memory consuming scripts):

undefined method `SetHiddenValue' for nil:NilClass
undefined method `Exit' for nil:NilClass
undefined method `Run' for nil:NilClass
random segfaults

Tomorrow will dare to check the effect of the expermental branch on production ;)

sergeych commented 10 years ago

Experiments how that the new branch show less or none "object recycled" issues, but the errors like "undefined method `Exit' for nil:NilClass" and the same with SetHiddenValue and like happen periodically. Thanks anyway!

cowboyd commented 10 years ago

@sergeych Well, if we can separate it into two separate bugs, then that at least is something. What percentage of your errors did the "object recycled" issues account for?

sergeych commented 10 years ago

@cowboyd After all my hacks (to retain objects) and your branch I do not experience positively identified "reclaimed ruby object" bugs. Still I'm getting huge amount of "undefined method Exit for nil:NilClass", and times less same errors concerning "SetHiddenValue" and "Run" methods. I think it is time to open another issue. Also, from the frequency and names I suppose it might be v8::Context that was kept in Handle<>, not Permanent<> that is occasionally recycled. But not certainly.

rubyjs / therubyracer

Ruby objects in v8 are held as weak #320