medspacy / QuickUMLS

System for Medical Concept Extraction
MIT License
3 stars 6 forks source link

fix: Add serialization for UmlsMatch #20

Open aarongiera opened 4 months ago

aarongiera commented 4 months ago

Span extensions need to be serializable for spacy's multiprocessing to work. Currently a serialization error occurs when adding the quickumls pipeline with nlp.pipe and n_processes > 1.

jianlins commented 4 months ago

Why would you prefer List over Set when add new matches to span._.umls_matches? Would that possibly cause duplicated matches? Is the "Set" cause the serialization issue? Can you add tests functions?

jianlins commented 4 months ago

Did you implement the to_disk and from_disk function?