quadrama / DramaNLP

UIMA NLP components for dramatic texts
Apache License 2.0
9 stars 3 forks source link

Missing date information #69

Open pagelj opened 5 years ago

pagelj commented 5 years ago

Some of the dates in the metadata table (like written, premier etc.) are not being exported anymore

pagelj commented 5 years ago

This issue occurs when the date is not explicitly given as text, e.g.

<date type="print" when="1888">1888</date>

will be exported but

<date type="printed" when="1888"/>

won't

An example is https://github.com/quadrama/gerdracor/blob/quadrama/data/1871-Anzengruber_Ludwig-Der_Meineidbauer-lina.xml

pagelj commented 5 years ago

@nilsreiter This is an issue of https://github.com/nilsreiter/uima-util/blob/6f3bc3b8785702fe4ab9b670b87eaa215006f593/src/main/java/de/unistuttgart/ims/uimautil/GenericXmlReader.java#L133 as it won't detect elements without text

pagelj commented 4 years ago

This get's fixed by https://github.com/nilsreiter/generic-xml-reader/commit/0f8c085a76dd57e43e9e5dcb81e8ab4055b36bee, but generic-xml-reader version 1.6 requires a higher DKpro version as is currently used in DramaNLP. Unfornuately, updating to this DKpro version is not trivial, so this will be postponed until https://github.com/nilsreiter/generic-xml-reader/issues/12 and https://github.com/quadrama/DramaNLP/issues/45 have been resolved