Closed noelslice closed 3 years ago
disclaimer: I'm still not convinced the logic in extract_mentions_spans
and _extract_from_sent
is robust. Working on my understanding of the code. It would help to add some test cases.
Thanks for this PR @noelslice! Looks good to me. There are definitely parts of the code base that could use more test cases - all contributions welcome!
Thanks for having a look and merging this in @svlandeg !
Using the same example input mentioned here: https://github.com/huggingface/neuralcoref/issues/215#issuecomment-568702452 there seems to be a spurious mention "than Shyam" because the subordinating conjunction "than" was not excluded in the mention span detection.
This PR adds the SCONJ tag to the REMOVE_POS list.
Test case:
Current output:
New output ("than Sham" excluded):
The live demo also doesn't display this mention: