mrkn / pycall.rb

Calling Python functions from the Ruby language
MIT License
1.05k stars 72 forks source link

SEGV when python code is pycall-ed from Puma (works without puma) #185

Closed snickell closed 1 month ago

snickell commented 1 month ago

We are hitting an issue where we have code that works from simple irb, or from rails console, but crashes when run from a Rails controller. I have reduced our issue to a minimal repro that uses only Puma. I'm comfortable with C and C-debugging tools if that helps, I'm just trying to figure out where to start.

Minimal repro (tried to make it very simple): https://github.com/snickell/pycall_puma_crash

We'd like to use pycall.rb for code.org (github). Its a very clever approach, thank you @mrkn 🙇

snickell commented 1 month ago

I suspect this is the same issue as https://github.com/mrkn/pycall.rb/issues/175, but unlike that issue I have reduced it to only Puma, PyCall, and one python module. I will work to find a simpler python module, because llmguard is complicated. It would be great to find a repro that only uses pandas (hinted at by #175).

snickell commented 1 month ago

I've got a pandas-only repro now, updating main comment to match.

snickell commented 1 month ago

Aha! Even when you set threads=1, puma still spawns a different thread for requests than ran the initial code. This will affect Rails users as well:

# Setup our local venv (using pdm, in .venv)
ENV['PYTHON'] = `pdm run which python`.strip
site_dir = `pdm run python -c 'import site; print(site.getsitepackages()[0])'`.strip

require 'pycall'
$pycall_thread_id = Thread.current.object_id

# This is to setup our local venv
site = PyCall.import_module('site')
site.addsitedir(site_dir)

module CrashPuma

  def self.do_crash
    raise "Thread IDs did not match: started with thread #{$pycall_thread_id}, but request is on thread #{Thread.current.object_id}" if $pycall_thread_id != Thread.current.object_id
    # => "Thread IDs did not match...."

    puts "About to crash (if running in puma)"

    pandas = PyCall.import_module('pandas')
    data = pandas.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv', sep: ';')
    puts data.head()

    puts "IT DID NOT CRASH"
  end
end

Conclusion: there may not be a safe way to use pycall from puma-using servers, including a default rails configuration. Puma always starts a thread even if threads=1.

snickell commented 1 month ago

See: https://github.com/mrkn/pycall.rb/issues/96