It looks like that's exactly what has happened. The master gunicorn process used some API in libtorch that acquired a lock; when the process forked, that lock is still locked, and there is no way to unlock it.
Do you think it's possible to run the function encode_passages in a thread? Could this be the reason of the issue?
Running ColBERT with a GUnicorn server with shared memory, like this:
apparently causes a lock when running the Toch inference session: https://github.com/stanford-futuredata/ColBERT/blob/852271661b22567e3720f2dd56b6d503613a3228/colbert/indexing/collection_encoder.py#L26
This problem was also explained here:
https://github.com/benoitc/gunicorn/issues/2478#issuecomment-749734412
Do you think it's possible to run the function
encode_passages
in a thread? Could this be the reason of the issue?