stanford-futuredata / ColBERT

ColBERT: state-of-the-art neural search (SIGIR'20, TACL'21, NeurIPS'21, NAACL'22, CIKM'22, ACL'23, EMNLP'23)
MIT License
2.67k stars 355 forks source link

CollectionEncoder blocking on encoder N passages #322

Open paolomagnani-mxm opened 3 months ago

paolomagnani-mxm commented 3 months ago

Running ColBERT with a GUnicorn server with shared memory, like this:

#!/bin/sh
gunicorn --chdir /app handler:app -w 4 --preload -k uvicorn.workers.UvicornWorker

apparently causes a lock when running the Toch inference session: https://github.com/stanford-futuredata/ColBERT/blob/852271661b22567e3720f2dd56b6d503613a3228/colbert/indexing/collection_encoder.py#L26

This problem was also explained here:

https://github.com/benoitc/gunicorn/issues/2478#issuecomment-749734412

It looks like that's exactly what has happened. The master gunicorn process used some API in libtorch that acquired a lock; when the process forked, that lock is still locked, and there is no way to unlock it.

Do you think it's possible to run the function encode_passages in a thread? Could this be the reason of the issue?

paolomagnani-mxm commented 3 months ago

Any idea on this ?