Open loverma2 opened 4 months ago
Hi Malin,
Yes, I see the problem in case a term is associated with multiple concepts (CUIs) (see also #36). The solution would be for clinlp
to use spans instead of entities, and I think we should implement that change in a new release. Though right now I'm unsure if that would cause any other problems, so we need to investigate further.
In the meantime, can you check whether you are using the last version of clinlp (0.6.4)? I'm not 100% sure what exactly causes this error, so if you happen to have a minimal example (concepts and a sample text), that would also be of value to further resolve the issue.
Thanks, Vincent
Hi @loverma2, I just released version 0.8.0 of clinlp
, which should be able to handle overlapping concepts.
You will need to make some small changes to your code to keep it working, you can find all changes here: https://github.com/umcu/clinlp/blob/main/CHANGELOG.md
The entities can be found in doc.spans['ents']
(rather than the previous doc.ents
). By default, overlapping entities are kept, but you can also configure the pipeline to resolve overlap (takes longest, assuming that is the most specific):
nlp.add_pipe('clinlp_rule_based_entity_matcher', config={'resolve_overlap': True}
Let me know if this helped at all, or if you need any help migrating to the latest version. We can always schedule a call or have a short meeting if that's helpful as well. Curious if this solved the problem, or whether some other issue still exists.
Best, Vincent
Awesome, thank you Vincent!
Hi colleagues,
For research purposes I have loaded the entire Dutch UMLS on the basis of which I would like to perform NER+L with clinlp. I aim to extract and link all entities in over 90,000 clinical reports (anamneses) of heart failure patients with which I can do further research.
This is an example of what my concept2cuis dictionary looks like.
However, I run into an error when running the nlp pipeline.
I think this has to do with the fact that there can be either duplicate CUIs (keys) or duplicate concepts (values) in the UMLS. I would like the pipeline to be able to deal with this, because it makes clinical sense to have duplicates. For example, a list of matches or picking the first possible match would be great. Is this possible?
Thank you very much in advance and if you need more information please let me know!
Malin