tarsqi / ttk

Tarsqi Toolkit
Apache License 2.0
25 stars 10 forks source link

Insert TIMEX3 tags from GUTime #75

Closed marcverhagen closed 7 years ago

marcverhagen commented 7 years ago

The process for inserting tags from GUTime is not very smart. It uses the tree.Node.insert() method which is not really intended for doing this and cannot deal with new TIMEX3 tags that do not neatly agree with chunks from the preprocessor.

First make sure that insert does something more helpful than just printing a generic warning when it fails to insert a tag, then figure out a way to deal with more cases.

marcverhagen commented 7 years ago

Probably the best way to deal with this is to update the chunking given the results of GUTime. For example, the phrase "for the fourth quarter ended Aug. 26." (wsj_0263) is chunked as follows:

[for the fourth quarter ended Aug.]NG 26.

So when GUTime finds Aug. 26 it does enter it into the TagRepository, but it is not added into the TarsqiTree when later components apply, and it can therefore not be linked. If we could adjust the chunking, we would actually end up with one of the following TarsqiTree fragments

[for the [fourth quarter]timex3 ended [Aug. 26]timex3]NG.
for the [[fourth quarter]timex3]NG ended [[Aug. 26]timex3]NG.

The second one is probably the more useful one.

marcverhagen commented 7 years ago

Closed this, but added a new issue specific to the chunking in #76.