korpling / annatto

Converts linguistic data formats based on the graphANNIS data model as intermediate representation and can apply consistency tests.
Apache License 2.0
1 stars 0 forks source link

Adapt import of EXMARaLDA files to new editor behaviour #217

Open MartinKl opened 3 months ago

MartinKl commented 3 months ago

In the latest version of EXMARaLDA a tokenizer tool is included, that automatically splits spans into tokens. To do so, new timeline items (tlis) are introduced. These are not provided with a time value (i. e. there is no attribute time in the xml element). There ordering is marked in their id value (e. g. "T32.TIE0.1", "T32.TIE0.2", ...). Sometimes, there are even untimed tags with a regular id (e. g. "T128") created at a so far unclear moment in the edition of a file. As a consequence, the only reliable ordering or tlis is the order in which they are mentioned in the xml file. Thus, this has to be used for sorting events when importing them. Therefore the parser needs to be configured to callback in order of events. Maybe it already is, but we need to check.