Open aarongiera opened 9 months ago
Why would you prefer List over Set when add new matches to span._.umls_matches? Would that possibly cause duplicated matches? Is the "Set" cause the serialization issue? Can you add tests functions?
Did you implement the to_disk and from_disk function?
Span extensions need to be serializable for spacy's multiprocessing to work. Currently a serialization error occurs when adding the quickumls pipeline with
nlp.pipe
andn_processes > 1
.