Open hoblin opened 10 months ago
Hi @hoblin . I've been struggling with this too for the last week and I finally found a workaround. Basically I stopped using this gem since as you say, it hangs the process. This is the line that brings all problems: https://github.com/yohasebe/ruby-spacy/blob/6a51fc80c5a37fd317d7e91fef316537bb917b62/lib/ruby-spacy.rb#L360 Calling exec seems a bit risky.
I ended up using another approach when running python scripts from ruby. From your controller you can run:
some_value = python3 lib/python/filename.py "#{param1}" "#{param2}"
. And in the py file you can import the "en_core_web_md" model -> nlp = spacy.load("en_core_web_md")
. Hope that helps!
@omartorresrios Thanks mate. I actually got rid of everything spacy related in my app and now i just have these four lines in my docker-compose
spacyapi:
image: hoblin/spacy-server:2-en_core_web_md
ports:
- '8000:8000'
It's a little lightweight image that i forked from the old outdated (as anything in NLP realm) public image to make a multi-platform build for my kamal-based deploy. So now i just have a little lighting-fast API for tokenization.
@hoblin I am trying to deploy an app with Kamal and a wrapper to Spacy. I tried using your docker image but I must be missing something as when I call Spacy::Language.new("en_core_web_md")
it doesn't find the model.
Would you mind sharing a bit more about your solution ?
Thanks
@adrienpoly Sure. I use it as a micro-service and call it via API.
docker-compose.yml
:
version: '3'
services:
spacyapi:
image: hoblin/spacy-server:2-en_core_web_md
ports:
- '8000:8000'
config/deploy.yml
(Kamal):
servers:
web:
...
options:
network: networkname
jobs:
...
options:
network: networkname
accessories:
spacyapi:
image: hoblin/spacy-server:2-en_core_web_md
port: 8000
roles:
- web
- jobs
options:
network: networkname
lib/nlp/spacy.rb
:
require "httparty"
module Nlp
class Spacy
HOST = Rails.env.development? ? "http://localhost:8000" : "http://projectname-spacyapi:8000"
class << self
def named_entities(text)
HTTParty.post(
"#{HOST}/ner",
body: {sections: [text]}.to_json,
headers: {"Content-Type" => "application/json"}
).dig("data", 0, "entities")
end
def tokens(text)
HTTParty.post(
"#{HOST}/pos",
body: {text: text}.to_json,
headers: {"Content-Type" => "application/json"}
).dig("data", 0, "tags")
end
end
end
end
The image itself is a fork of https://github.com/neelkamath/spacy-server and the only difference from original is the MULTI-PLATFORM build (amd+arm). https://hub.docker.com/layers/hoblin/spacy-server/2-en_core_web_md/images/sha256-6c75186013463efca502b8621456fee8af0c3a75c2ea14e223674df9627a3327?context=repo
You can look for available endpoints here https://github.com/neelkamath/spacy-server/blob/master/src/main.py#L62
Apologies for the delayed response!
I've addressed this issue by adding a timeout feature to Spacy::Language.new
.
Please check out the updated README for details and give the latest version a try. Hopefully, this will resolve the hanging problem. If you're still running into issues, @hoblin's microservice approach could be a great alternative to explore.
Thanks again for bringing this to attention.
I built a little tokenizer in my app and it works great when called from the rails console. But once I run it from a background job (I tried both sidekiq and solid_queue) or from a server process (puma) with
after_create_commit
callback, it just hangs the process and I need to kill it. It looks similar to this issue https://github.com/mrkn/pycall.rb/issues/95