microth / heideltime

Automatically exported from code.google.com/p/heideltime
4 stars 1 forks source link

Incorrect value for decades/centuries? #14

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
I decided to update to 1.5 and I notice something that has changed.
If I submit this sentence:
"Near the southern end, signs saying 'Hatfield and the North' inspired the 
eponymous 1970s rock band Hatfield and the North."
The date "1970s" is tagged: <TIMEX3 tid="t86" type="DATE" 
value="197">1970s</TIMEX3>
Or before it was <TIMEX3 tid="t85" type="DATE" value="197X">1970s</TIMEX3>. 
Which make more sense (no ambiguity with year 197). 

The same with centuries:
now: <TIMEX3 tid="t4" type="DATE" value="11">the 12th century</TIMEX3>
before: <TIMEX3 tid="t3" type="DATE" value="11XX">the 12th century</TIMEX3>

Original issue reported on code.google.com by damien.p...@gmail.com on 4 Feb 2014 at 4:19

GoogleCodeExporter commented 9 years ago
I saw in resources file, that's because for TempEval-3 you shouldn't put X at 
ends but for TIMEX3 you have to?
In this case, an option in the configuration file to choose TIMEX3 or TempEval 
format would be nice.

Original comment by damien.p...@gmail.com on 4 Feb 2014 at 4:25

GoogleCodeExporter commented 9 years ago
Hi Damien,

Thanks for your mail. We actually made these changes because we want the 
annotations to be more close to the TimeML TIMEX3 annotation standard. The 
values are defined in the following way:
- decade expressions: three-digit numbers (e.g., "197" for "1970s")
- century expressions: two-digit numbers 
- millennium expressions: one-digit numbers 
- values for expressions referring to years such as "200 (AD)" and "20 (AD)", 
are to be annotated as "0200" and "0020", respectively
- values for expressions referring to years such as "200 BC" and "20 BC", are 
to be annotated as "BC0200" and "BC0020", respectively.

Currently, we are working on a new version supporting the extraction and 
normalization of historic temporal expressions (e.g., BC date expressions). For 
this, it is quite important that we now stick to the annotation standard more 
closely.

Now, after the explanation of the semantics of the values: Do you think you can 
use the value normalization as it is now? 

Please keep us in the loop.

Thank you and best regards,
Jannik

Original comment by jannik.s...@gmail.com on 4 Feb 2014 at 4:38

GoogleCodeExporter commented 9 years ago
Hi Jannik,

thanks for your quick answer.
I see. So a year can only be written on 4 digits, so no confusion possible if 
the decade is on 3 digits.

Yes thanks for your detailed explanation, I changed my application to follow 
these rules and it works well.

I think you can add this to the manual it's quite useful!

Best regards
Damien

Original comment by damien.p...@gmail.com on 4 Feb 2014 at 4:59

GoogleCodeExporter commented 9 years ago

Original comment by jannik.s...@gmail.com on 4 Apr 2014 at 12:26