qwaider / heideltime

Automatically exported from code.google.com/p/heideltime
0 stars 0 forks source link

overlapping timexes produce broken XML in the standalone #5

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
When the standalone version of heideltime is used to tag a document that 
contains multiple overlapping temporal expressions, the TimeML writer module 
will produce invalid XML code.

Due to the nature of inline tags in TimeML documents, this condition cannot be 
resolved entirely satisfactorily; two overlapping timexes would produce 
overlapping XML tags which would be semantically invalid.

The condition of two overlapping timexes should *ideally* never occur, since if 
a temporal expression produces two overlapping timexes, this temporal 
expression should also be representable by a single timex that spans both of 
the smaller timexes. The recognition of temporal expressions however is subject 
to the utilized resources/rules and whether they include such a "larger" rule.

Different domains such as poetry however can produce unexpected sentence syntax 
which may elude any of the existing rulesets otherwise thought of as 
comprehensive.

To resolve the bug that produces broken XML/TimeML tags, we will, for 
overlapping timexes, only create an XML tag for the first recognized timex, 
omitting all of the subsequent timexes that overlap with the first one.

Our thanks go to Armin Hoenen for bringing this bug to our attention.

Original issue reported on code.google.com by jul...@gmail.com on 31 Jul 2012 at 12:19

GoogleCodeExporter commented 9 years ago
Added code to prevent overlapping, subsequent timexes from being tagged in the 
XML/TimeML output with changeset r8ab4f0e23482.

Tried it out on a use case that previously resulted in invalid XML code; bug 
seems fixed.

Will test some more before merging it to default.

Original comment by jul...@gmail.com on 31 Jul 2012 at 1:19

GoogleCodeExporter commented 9 years ago
did some more testing; works as expected. merging to default.

Original comment by jul...@gmail.com on 31 Jul 2012 at 11:30

GoogleCodeExporter commented 9 years ago

Original comment by j.z...@stud.uni-heidelberg.de on 18 Apr 2013 at 10:11