Closed kaushikacharya closed 3 years ago
@yfpeng I figured out a better solution.
https://github.com/ncbi-nlp/NegBio/blob/master/negbio/pipeline/negdetect.py#L83
for name, matcher, loc in detector.detect(sentence, locs):
Instead of passing the locs for the entire passage, we can pass the unique locs for the current sentence.
https://github.com/ncbi-nlp/NegBio/blob/master/negbio/neg/neg_detector.py#L47
for node in find_nodes(g, loc[0], loc[1]):
All the locs which don't belong to the current sentence are currently getting unnecessarily processed.
https://github.com/ncbi-nlp/NegBio/blob/master/negbio/pipeline/negdetect.py#L85
_mark_anns(passage.annotations, loc[0], loc[1], name)
Also all the annotations belonging to other sentences are also checked for overlap with the loc for which we found negation/uncertainty.
I have made the above changes in my local copy.
In a CT report with around 60 sentences, I have found the following step reduced from 25 seconds to 14 seconds:
https://github.com/ncbi-nlp/NegBio/blob/master/negbio/main_mm.py#L60
document = negdetect.detect(document, neg_detector)
Thank you!
https://github.com/ncbi-nlp/NegBio/blob/master/negbio/pipeline/negdetect.py#L73-L76
Here location range of CUIs are collected, out of which some of them can be duplicates. This happens because MetaMap creates multiple CUIs for the same text span.
And then neg_detector.py # detect() iterates over the for loop of locs: https://github.com/ncbi-nlp/NegBio/blob/master/negbio/neg/neg_detector.py#L44
Isn't it better if we remove duplicates from locs in negdetect.py by using
An example of duplicate loc elements:
For the sentence:
following two CUIs are generated in the same location span: