`Spacy::Language.new` hangs when called from sidekiq worker

hoblin commented 10 months ago

I built a little tokenizer in my app and it works great when called from the rails console. But once I run it from a background job (I tried both sidekiq and solid_queue) or from a server process (puma) with after_create_commit callback, it just hangs the process and I need to kill it. It looks similar to this issue https://github.com/mrkn/pycall.rb/issues/95

omartorresrios commented 8 months ago

Hi @hoblin . I've been struggling with this too for the last week and I finally found a workaround. Basically I stopped using this gem since as you say, it hangs the process. This is the line that brings all problems: https://github.com/yohasebe/ruby-spacy/blob/6a51fc80c5a37fd317d7e91fef316537bb917b62/lib/ruby-spacy.rb#L360 Calling exec seems a bit risky.

I ended up using another approach when running python scripts from ruby. From your controller you can run: some_value = python3 lib/python/filename.py "#{param1}" "#{param2}". And in the py file you can import the "en_core_web_md" model -> nlp = spacy.load("en_core_web_md"). Hope that helps!

hoblin commented 8 months ago

@omartorresrios Thanks mate. I actually got rid of everything spacy related in my app and now i just have these four lines in my docker-compose

  spacyapi:
    image: hoblin/spacy-server:2-en_core_web_md
    ports:
      - '8000:8000'

It's a little lightweight image that i forked from the old outdated (as anything in NLP realm) public image to make a multi-platform build for my kamal-based deploy. So now i just have a little lighting-fast API for tokenization.

adrienpoly commented 7 months ago

@hoblin I am trying to deploy an app with Kamal and a wrapper to Spacy. I tried using your docker image but I must be missing something as when I call Spacy::Language.new("en_core_web_md") it doesn't find the model. Would you mind sharing a bit more about your solution ? Thanks

hoblin commented 7 months ago

@adrienpoly Sure. I use it as a micro-service and call it via API.

docker-compose.yml:

version: '3'

services:
  spacyapi:
    image: hoblin/spacy-server:2-en_core_web_md
    ports:
      - '8000:8000'

config/deploy.yml(Kamal):

servers:
  web:
      ...
    options:
      network: networkname
  jobs:
    ...
    options:
      network: networkname
accessories:
  spacyapi:
    image: hoblin/spacy-server:2-en_core_web_md
    port: 8000
    roles:
      - web
      - jobs
    options:
      network: networkname

lib/nlp/spacy.rb:

require "httparty"

module Nlp
  class Spacy
    HOST = Rails.env.development? ? "http://localhost:8000" : "http://projectname-spacyapi:8000"

    class << self
      def named_entities(text)
        HTTParty.post(
          "#{HOST}/ner",
          body: {sections: [text]}.to_json,
          headers: {"Content-Type" => "application/json"}
        ).dig("data", 0, "entities")
      end

      def tokens(text)
        HTTParty.post(
          "#{HOST}/pos",
          body: {text: text}.to_json,
          headers: {"Content-Type" => "application/json"}
        ).dig("data", 0, "tags")
      end
    end
  end
end

The image itself is a fork of https://github.com/neelkamath/spacy-server and the only difference from original is the MULTI-PLATFORM build (amd+arm). https://hub.docker.com/layers/hoblin/spacy-server/2-en_core_web_md/images/sha256-6c75186013463efca502b8621456fee8af0c3a75c2ea14e223674df9627a3327?context=repo

You can look for available endpoints here https://github.com/neelkamath/spacy-server/blob/master/src/main.py#L62

yohasebe commented 2 months ago

Apologies for the delayed response!

I've addressed this issue by adding a timeout feature to Spacy::Language.new.

Please check out the updated README for details and give the latest version a try. Hopefully, this will resolve the hanging problem. If you're still running into issues, @hoblin's microservice approach could be a great alternative to explore.

Thanks again for bringing this to attention.

yohasebe / ruby-spacy

`Spacy::Language.new` hangs when called from sidekiq worker #6