tarsqi / ttk

Tarsqi Toolkit
Apache License 2.0
25 stars 10 forks source link

duplicate event imports #62

Open marcverhagen opened 7 years ago

marcverhagen commented 7 years ago

The Evita component that imports existing code has a problem with things like hypodense lesion within the caudate lobe and imports that event twice. The reason is that the event is spread out over two chunks:

[hypodense lesion]ng within [the caudate lobe]ng

Evita considers both 'lesion' and 'lobe' as events and in both cases it finds the imported event and installs it on the chunk. Should limit event import only to those cases where the head of the imported event falls within the chunk Evita is looking at. This will miss cases however where the head of the imported event does not appear in a chunk.

Another option is to do a post screening of all events added and remove duplicates.

Incidentally, this issue also causes problems for the alignment code in testing/evaulate.py, resulting in the following alignment

hypodense lesion within the caudate lobe - hypodense lesion within the caudate lobe
None                                     - hypodense lesion within the caudate lobe

and then counting the second alignment as a false positive.

marcverhagen commented 7 years ago

This is mostly solved after some additions to the chunker, but some of the above suggestions and other changes may be worthwhile: