yohasebe / ruby-spacy

A wrapper module for using spaCy natural language processing library from the Ruby programming language via PyCall
MIT License
63 stars 5 forks source link

TypeError: class or module required #2

Closed mscrivo closed 1 year ago

mscrivo commented 1 year ago

Hi @yohasebe, first off, thank you for this wonderful gem, it's been a very useful tool for us! We are having a bit of an issue with it lately that I'm stumped on and hoping you might be able to shed some light on. Perhaps the issue is completely on pycall and should be file over there, let me know. Anyhow, we have a singleton where we instantiate an instance of the lib and do some basic sentence segmenting. Code looks like this:

require 'ruby-spacy'

module MyModule
  class NLP
    include Singleton

    def initialize
      @nlp = Spacy::Language.new('en_core_web_sm')
    end

    def segment_sentences(text)
      sentences = @nlp.read(text).sents.map(&:text)

      ...
   end
end

segment_sentences is called by a single threaded process many times and what we're seeing quite often is many errors that look like this:

TypeError: class or module required
/home/affinity/vendor/bundle/ruby/3.2.0/gems/pycall-1.4.2/lib/pycall/pyobject_wrapper.rb:93:in `call_object': class or module required (TypeError)
    from /home/vendor/bundle/ruby/3.2.0/gems/pycall-1.4.2/lib/pycall/pyobject_wrapper.rb:93:in `call'
    from /home/vendor/bundle/ruby/3.2.0/gems/ruby-spacy-0.1.4.1/lib/ruby-spacy.rb:70:in `initialize'
    from /home/vendor/bundle/ruby/3.2.0/gems/ruby-spacy-0.1.4.1/lib/ruby-spacy.rb:228:in `new'
    from /home/vendor/bundle/ruby/3.2.0/gems/ruby-spacy-0.1.4.1/lib/ruby-spacy.rb:228:in `read'
    from /home/lib/nlp.rb:56:in `segment_sentences'

What's interesting is that we had been getting this error periodically for quite some time, but since upgrading to Ruby 3.2, the number of them has exploded.

Do you have any idea what's wrong or how we can possibly mitigate the errors?

Thank you

yohasebe commented 1 year ago

That is interesting. Unfortunately, I was unable to reproduce the problem using your sample code in my environment with Ruby 3.2 (and Python 3.6.10 installed via pyenv) It seems more likely to be a problem on the PyCall side, but I will take another look anyway. Thanks!

yohasebe commented 1 year ago

@mscrivo, I have modified the code and released version 0.1.5.0. Could you try it and check if the problem still occurs?

mscrivo commented 1 year ago

will do! I'll get back to you early next week to let you know if it worked or not .. thank you!

mscrivo commented 1 year ago

deployed this version to our test environment and it seems we're still seeing roughly the same volume of errors as before :(

TypeError: class or module required
/home/affinity/vendor/bundle/ruby/3.2.0/gems/pycall-1.4.2/lib/pycall.rb:47:in `exec': class or module required (TypeError)
    from /home/vendor/bundle/ruby/3.2.0/gems/pycall-1.4.2/lib/pycall.rb:47:in `exec'
    from /home/vendor/bundle/ruby/3.2.0/gems/ruby-spacy-0.1.5.0/lib/ruby-spacy.rb:216:in `initialize'
    from /home/lib/nlp.rb:27:in `new'
    from /home/lib/nlp.rb:27:in `initialize'
yohasebe commented 1 year ago

Hmmm... As a workaround, I added a retry mechanism to the constructor of the Spacy::Language class, update to version 0.1.5.1 and check it out.

The maximum number of retries (every 0.5 seconds) is 20 by default, but you can set it in the second argument of the constructor:

@nlp = Spacy::Language.new('en_core_web_sm', 30)

I hope this workaround is effective.

mscrivo commented 1 year ago

That didn't seem to help unfortunately. Looking at your change, it seems like you put the retry in a different initialize? We're hitting the error here: https://github.com/yohasebe/ruby-spacy/blob/490c685ff3f93056e54f5f39e7540a7dfa6c61dd/lib/ruby-spacy.rb#L64

yohasebe commented 1 year ago

You are correct, I have added a retry to the Doc constructor and released version 0.1.5.2. If this does not work, please provide some more details about your use case.

mscrivo commented 1 year ago

This is what we're seeing now:

RuntimeError: Error: Failed to construct a Doc object
/home/vendor/bundle/ruby/3.2.0/gems/ruby-spacy-0.1.5.2/lib/ruby-spacy.rb:68:in `rescue in initialize': Error: Failed to construct a Doc object (RuntimeError)
    from /home/vendor/bundle/ruby/3.2.0/gems/ruby-spacy-0.1.5.2/lib/ruby-spacy.rb:62:in `initialize'
    from /home/vendor/bundle/ruby/3.2.0/gems/ruby-spacy-0.1.5.2/lib/ruby-spacy.rb:71:in `rescue in initialize'
    from /home/vendor/bundle/ruby/3.2.0/gems/ruby-spacy-0.1.5.2/lib/ruby-spacy.rb:62:in `initialize'
    from /home/vendor/bundle/ruby/3.2.0/gems/ruby-spacy-0.1.5.2/lib/ruby-spacy.rb:71:in `rescue in initialize'
    from /home/vendor/bundle/ruby/3.2.0/gems/ruby-spacy-0.1.5.2/lib/ruby-spacy.rb:62:in `initialize'
    from /home/vendor/bundle/ruby/3.2.0/gems/ruby-spacy-0.1.5.2/lib/ruby-spacy.rb:71:in `rescue in initialize'
    from /home/vendor/bundle/ruby/3.2.0/gems/ruby-spacy-0.1.5.2/lib/ruby-spacy.rb:62:in `initialize'
    from /home/vendor/bundle/ruby/3.2.0/gems/ruby-spacy-0.1.5.2/lib/ruby-spacy.rb:71:in `rescue in initialize'
    from /home/vendor/bundle/ruby/3.2.0/gems/ruby-spacy-0.1.5.2/lib/ruby-spacy.rb:62:in `initialize'
    from /home/vendor/bundle/ruby/3.2.0/gems/ruby-spacy-0.1.5.2/lib/ruby-spacy.rb:71:in `rescue in initialize'
    from /home/vendor/bundle/ruby/3.2.0/gems/ruby-spacy-0.1.5.2/lib/ruby-spacy.rb:62:in `initialize'
    from /home/vendor/bundle/ruby/3.2.0/gems/ruby-spacy-0.1.5.2/lib/ruby-spacy.rb:71:in `rescue in initialize'
    from /home/vendor/bundle/ruby/3.2.0/gems/ruby-spacy-0.1.5.2/lib/ruby-spacy.rb:62:in `initialize'
    from /home/vendor/bundle/ruby/3.2.0/gems/ruby-spacy-0.1.5.2/lib/ruby-spacy.rb:71:in `rescue in initialize'
    from /home/vendor/bundle/ruby/3.2.0/gems/ruby-spacy-0.1.5.2/lib/ruby-spacy.rb:62:in `initialize'
    from /home/vendor/bundle/ruby/3.2.0/gems/ruby-spacy-0.1.5.2/lib/ruby-spacy.rb:71:in `rescue in initialize'
    from /home/vendor/bundle/ruby/3.2.0/gems/ruby-spacy-0.1.5.2/lib/ruby-spacy.rb:62:in `initialize'
    from /home/vendor/bundle/ruby/3.2.0/gems/ruby-spacy-0.1.5.2/lib/ruby-spacy.rb:71:in `rescue in initialize'
    from /home/vendor/bundle/ruby/3.2.0/gems/ruby-spacy-0.1.5.2/lib/ruby-spacy.rb:62:in `initialize'
    from /home/vendor/bundle/ruby/3.2.0/gems/ruby-spacy-0.1.5.2/lib/ruby-spacy.rb:71:in `rescue in initialize'
    from /home/vendor/bundle/ruby/3.2.0/gems/ruby-spacy-0.1.5.2/lib/ruby-spacy.rb:62:in `initialize'
    from /home/vendor/bundle/ruby/3.2.0/gems/ruby-spacy-0.1.5.2/lib/ruby-spacy.rb:71:in `rescue in initialize'
    from /home/vendor/bundle/ruby/3.2.0/gems/ruby-spacy-0.1.5.2/lib/ruby-spacy.rb:62:in `initialize'
    from /home/vendor/bundle/ruby/3.2.0/gems/ruby-spacy-0.1.5.2/lib/ruby-spacy.rb:71:in `rescue in initialize'
    from /home/vendor/bundle/ruby/3.2.0/gems/ruby-spacy-0.1.5.2/lib/ruby-spacy.rb:62:in `initialize'
    from /home/vendor/bundle/ruby/3.2.0/gems/ruby-spacy-0.1.5.2/lib/ruby-spacy.rb:71:in `rescue in initialize'
    from /home/vendor/bundle/ruby/3.2.0/gems/ruby-spacy-0.1.5.2/lib/ruby-spacy.rb:62:in `initialize'
    from /home/vendor/bundle/ruby/3.2.0/gems/ruby-spacy-0.1.5.2/lib/ruby-spacy.rb:71:in `rescue in initialize'
    from /home/vendor/bundle/ruby/3.2.0/gems/ruby-spacy-0.1.5.2/lib/ruby-spacy.rb:62:in `initialize'
    from /home/vendor/bundle/ruby/3.2.0/gems/ruby-spacy-0.1.5.2/lib/ruby-spacy.rb:71:in `rescue in initialize'
    from /home/vendor/bundle/ruby/3.2.0/gems/ruby-spacy-0.1.5.2/lib/ruby-spacy.rb:62:in `initialize'
    from /home/vendor/bundle/ruby/3.2.0/gems/ruby-spacy-0.1.5.2/lib/ruby-spacy.rb:71:in `rescue in initialize'
    from /home/vendor/bundle/ruby/3.2.0/gems/ruby-spacy-0.1.5.2/lib/ruby-spacy.rb:62:in `initialize'
    from /home/vendor/bundle/ruby/3.2.0/gems/ruby-spacy-0.1.5.2/lib/ruby-spacy.rb:71:in `rescue in initialize'
    from /home/vendor/bundle/ruby/3.2.0/gems/ruby-spacy-0.1.5.2/lib/ruby-spacy.rb:62:in `initialize'
    from /home/vendor/bundle/ruby/3.2.0/gems/ruby-spacy-0.1.5.2/lib/ruby-spacy.rb:71:in `rescue in initialize'
    from /home/vendor/bundle/ruby/3.2.0/gems/ruby-spacy-0.1.5.2/lib/ruby-spacy.rb:62:in `initialize'
    from /home/vendor/bundle/ruby/3.2.0/gems/ruby-spacy-0.1.5.2/lib/ruby-spacy.rb:71:in `rescue in initialize'
    from /home/vendor/bundle/ruby/3.2.0/gems/ruby-spacy-0.1.5.2/lib/ruby-spacy.rb:62:in `initialize'
    from /home/vendor/bundle/ruby/3.2.0/gems/ruby-spacy-0.1.5.2/lib/ruby-spacy.rb:71:in `rescue in initialize'
    from /home/vendor/bundle/ruby/3.2.0/gems/ruby-spacy-0.1.5.2/lib/ruby-spacy.rb:62:in `initialize'
    from /home/vendor/bundle/ruby/3.2.0/gems/ruby-spacy-0.1.5.2/lib/ruby-spacy.rb:236:in `new'
    from /home/vendor/bundle/ruby/3.2.0/gems/ruby-spacy-0.1.5.2/lib/ruby-spacy.rb:236:in `read'
    from /home/lib/nlp.rb:49:in `segment_sentences'

and it's taking 10s now, which is expected because of the 20 retries * 0.5s, but it's causing increased latency. What I'm noticing is that the errors don't start right away, but after the service has been running for a little while, and it gradually gets worse until it's restarted. I'll have to do some more investigating into why initialize is even being called so much. I would have thought with a singleton and a single thread, it wouldn't be initialized once on startup and that's it.

mscrivo commented 1 year ago

I would suggest to remove the retries as they clearly aren't the right solution. I will spend some time investigating this on my end and see if I can figure out what's going on and get back to you. Thank you for all your help so far!

yohasebe commented 1 year ago

Sounds good. I would appreciate it if you could let me know if you find out anything about the cause. Thanks!

mscrivo commented 1 year ago

Ok I was able to repro, one time at least, using the following code:

require 'benchmark'
require 'ruby-spacy'
require 'faker'

class NLP
  include Singleton

  def initialize
    @nlp = Spacy::Language.new('en_core_web_sm')
  end

  def segment_sentences(text)
    return [] if text.blank?

    @nlp.read(text).sents.map(&:text)
  end
end

n = 5000
i = 0
Benchmark.bm do |x|
  x.report do
    n.times do
      i = i+1
      if i % 10 == 0 then
        puts "GC start/compact ..."
        GC.start
        GC.compact
      end
      NLP.instance.segment_sentences(Faker::Lorem.paragraph(sentence_count: 500))
    end
  end
end

With that, I was able to see the following error on my machine:

  x.report { n.times {NLP.instance.segment_sentences(Faker::Lorem.paragraph(sentence_count: 500))}}
       user     system      total        real
GC start/compact ...
 NoMethodError: undefined method `each' for [Ratione odit atque., Nihil voluptatem autem...<full contents of string was here, but removed to make it shorter>]:Object
from /Users/michaelscrivo/.rbenv/versions/3.2.0/lib/ruby/gems/3.2.0/gems/pycall-1.4.2/lib/pycall/pyobject_wrapper.rb:52:in `method_missing'

Then if I ran it again in the same process, I got:

  x.report { n.times {NLP.instance.segment_sentences(Faker::Lorem.paragraph(sentence_count: 500))}}
       user     system      total        real
GC start/compact ...
 TypeError: class or module required
from /Users/michaelscrivo/.rbenv/versions/3.2.0/lib/ruby/gems/3.2.0/gems/pycall-1.4.2/lib/pycall/pyobject_wrapper.rb:93:in `call_object'

which mimics exactly what we're seeing in production. Once that happens, you cannot instantiate the object again in the same process, you have to restart. I even got it to crash and do a core dump once. So something is definitely interacting badly with Ruby's GC, and it looks like whatever changes they made in 3.2 made it much worse.

Here is the first part of the dump when it crashed:

 /Users/michaelscrivo/.rbenv/versions/3.2.0/lib/ruby/gems/3.2.0/gems/pycall-1.4.2/lib/pycall.rb:45: [BUG] Segmentation fault at 0x0000000000000000
ruby 3.2.0 (2022-12-25 revision a528908271) [arm64-darwin22]

-- Crash Report log information --------------------------------------------
   See Crash Report log file in one of the following locations:
     * ~/Library/Logs/DiagnosticReports
     * /Library/Logs/DiagnosticReports
   for more details.
Don't forget to include the above Crash Report log file in bug reports.

-- Control frame information -----------------------------------------------
c:0056 p:---- s:0304 e:000303 CFUNC  :__dict__
c:0055 p:0011 s:0300 e:000299 METHOD /Users/michaelscrivo/.rbenv/versions/3.2.0/lib/ruby/gems/3.2.0/gems/pycall-1.4.2/lib/pycall.rb:45
c:0054 p:0044 s:0292 e:000291 METHOD /Users/michaelscrivo/.rbenv/versions/3.2.0/lib/ruby/gems/3.2.0/gems/ruby-spacy-0.1.4/lib/ruby-spacy.rb:221 [FINISH]
c:0053 p:---- s:0287 e:000286 CFUNC  :new
c:0052 p:0006 s:0282 e:000281 METHOD (pry):50 [FINISH]
c:0051 p:---- s:0278 e:000277 CFUNC  :new
c:0050 p:0015 s:0274 e:000273 BLOCK  /Users/michaelscrivo/.rbenv/versions/3.2.0/lib/ruby/3.2.0/singleton.rb:127 [FINISH]
mscrivo commented 1 year ago

I've narrowed it down to the GC.compact call. If I have that in there, then at least 1 time out of 10 when running the code, it will exhibit the above problem. If I don't have the compact call, then I can run it dozens of times without failure.

mscrivo commented 1 year ago

Yep, definitely the GC.compact calls. We have a background thread in our app that calls GC.compact every 30 mins that we've had for a while now that has worked well without causing any problems up till now, so I totally forgot about it. After removing that from this particular deployment that uses ruby-spacy, the issue has disappeared.

yohasebe commented 1 year ago

Excellent! I'm glad you finally found a way around the problem. That said, I am not sure if the problem is caused by Ruby's GC alone. I tried out your code on different environments:

It took a while, but everything finished without error.

One thing I noticed was that your test code uses String#blank? provided by ActiveRecord. Also it uses Singleton class without an explicit require "singleton". Is there any chance that some rubygems loaded elsewhere interfere with the GC?

mscrivo commented 1 year ago

One thing I noticed was that your test code uses String#blank? provided by ActiveRecord. Also it uses Singleton class without an explicit require "singleton". Is there any chance that some rubygems loaded elsewhere interfere with the GC

Highly doubtful. We use compact in all our deployments and have been for several years and it's never been an issue. This problem only manifests with pycall it seems and you can see that in the stack dump above. It's specifically the nlp instance that no longer points to anything. I will try to whittle down the repro to just something using Pycall and if I can do that, I'll file a bug over there.

My naive guess is that Pycall does something clever with its FFI and relies on a specific memory mapping to the python objects it's holding on to and GC.compact messes that up, or it's some very bizarre timing issue with GC.compact. But it's beyond my depth of expertise.

mscrivo commented 1 year ago

Seems like we might be missing a rb_gc_mark somewhere

yohasebe commented 1 year ago

I forgot about a potentially relevant feature of PyCall:

Releasing the RubyVM GVL during Python function calls https://github.com/mrkn/pycall.rb#releasing-the-rubyvm-gvl-during-python-function-calls

Can you run your code inside a PyCall.without_gvl block to see if it helps to release Ruby's global VM lock?

mscrivo commented 1 year ago

I forgot about a potentially relevant feature of PyCall:

Releasing the RubyVM GVL during Python function calls mrkn/pycall.rb#releasing-the-rubyvm-gvl-during-python-function-calls

Can you run your code inside a PyCall.without_gvl block to see if it helps to release Ruby's global VM lock?

Tried with:

require 'benchmark'
require 'ruby-spacy'
require 'faker'

n = 5000
i = 0
Benchmark.bm do |x|
  x.report do
    n.times do
      i = i+1
      #if i % 2 == 0 then
        puts "GC start/compact ..."
        GC.compact
      #end
      PyCall.without_gvl do
        Affinity::NLP.instance.segment_sentences(Faker::Lorem.paragraph(sentence_count: 500))
      end
    end
  end
end

And the error was reproduced on the second try. This is not an issue of locking via the global lock, but rather PyCall losing it's memory mapping for the object when GC.compact is called.

yohasebe commented 1 year ago

I see. Could you submit a report to PyCall? Hopefully, they can find something relevant to this issue.

mscrivo commented 1 year ago

yeah will do .. I'll close this issue for now. Thanks again for walking through that with me!