medspacy / QuickUMLS

System for Medical Concept Extraction
MIT License
3 stars 6 forks source link

Feature request: enable using in-memory database server #13

Closed mstubna closed 4 months ago

mstubna commented 1 year ago

I am encountering an issue with unqlite when deploying medspacy to Cloud Run in GCP (see details below). I don't think this is a bug in unqlite or QuickUMLS per-se but rather possibly a limitation or bug with how Cloud Run's file system works. Everything runs fine when either a) using unqlite when running in a Linux VM or b) using the level db option when running in Cloud Run.

Given that support for leveldb was (appropriately) removed, I think a better alternative when running medspacy + quickumls in a distributed way in the cloud would be to make use of an in-memory database server like Redis for storing the UMLS data. This would also enable the medspacy Docker image file size to be dramatically reduced because the UMLS files wouldn't have to be included.

Let me know if you have any thoughts!

unqlite.UnQLiteError: IO error while opening the target database file: /quickumls/cui-semtypes.db/cui.unqlite
    File "unqlite.pyx", line 490, in unqlite.UnQLite.check_call
    File "unqlite.pyx", line 414, in unqlite.UnQLite.fetch
    File "unqlite.pyx", line 408, in unqlite.UnQLite.fetch
cuis = pickle.loads(self.cui_db_get(db_key_encode(term)))
    File "/usr/local/lib/python3.11/dist-packages/quickumls/toolbox.py", line 279, in get
cuisem_match = sorted(self.cuisem_db.get(match))
    File "/usr/local/lib/python3.11/dist-packages/quickumls/core.py", line 324, in _get_all_matches
matches = self._get_all_matches(ngrams)
    File "/usr/local/lib/python3.11/dist-packages/quickumls/core.py", line 474, in _match
matches = self.quickumls._match(doc, best_match=self.best_match, ignore_syntax=self.ignore_syntax)
    File "/usr/local/lib/python3.11/dist-packages/quickumls/spacy_component.py", line 161, in __call__
doc = proc(doc, **component_cfg.get(name, {})) # type: ignore[call-arg]
    File "/usr/local/lib/python3.11/dist-packages/spacy/language.py", line 1011, in __call__
raise e
    File "/usr/local/lib/python3.11/dist-packages/spacy/util.py", line 1689, in raise_error
error_handler(name, proc, [doc], e)
    File "/usr/local/lib/python3.11/dist-packages/spacy/language.py", line 1016, in __call__
jianlins commented 4 months ago

With current version, you should be able to use customized path for unqlite db.