tarsqi / ttk

Tarsqi Toolkit
Apache License 2.0
25 stars 10 forks source link

DCT in Metadata tag vs TIMEX functionInDocument="CREATION_TIME" #24

Closed reevesr closed 7 years ago

reevesr commented 8 years ago

Related to issue #10. It is possible to get a different value for DCT in the metadata tag than the value in the TIMEX3 tag with functionInDocument="CREATION_TIME" .
Run tarsqi.py with the full pipeline on a TimeBank un-annotated file (ABC19980108.1830.0711.xml; from ttk/code/data/inTimeBank/ ) processed this document 3 different ways with 3 different results.

  1. source=timebank (metadata dct = timex creation_time value)
  2. source=ttk (metadata dct != timex creation_time value; metadata dct value=today's date)
  3. no source flag given (no timex tag with creation_time is generated; metadata dct value=today's date) Also, source set to 'xml' yields the same results as not specifying a source.

Attached files are named after the source flag used in processing ABC_NoSouce_out.txt ABC_TimeBankSource_out.txt ABC_TTKsource_out.txt ABC_xmlSource_out.txt

reevesr commented 8 years ago

omission typo. The first line should read thus: It is possible to get a different value for DCT in the metadata tag than the value in the tag with functionInDocument="CREATION_TIME" .

marcverhagen commented 8 years ago

I cannot replicate this problem. I ran the following on the input file:

python tarsqi.py --source=timebank data/in/TimeBank/ABC19980108.1830.0711.xml out-timebank.xml
python tarsqi.py --source=ttk data/in/TimeBank/ABC19980108.1830.0711.xml out-ttk.xml
python tarsqi.py --source=xml data/in/TimeBank/ABC19980108.1830.0711.xml out-xml.xml

And I get pretty much the results as I expect them: the first has the DCT taken from the document name, the second fails (because the input is not in ttk format) and the third uses the current date for the DCT. GUTIME does not seem to add any CREATION_TIME to TIMEX3 tags in the document, which worries me a bit.

So I think I need the specific command for each case as well as the input file.

Incidentally, if no source is given the default used is xml.