Closed rsimon closed 6 years ago
Sorry to reopen this, but I am experiencing the problem of unordered annotations with files imported from TEI XML. I am not sure about the filepart since i've been dealing with small texts so far.
Hi Chiara, yes indeed that's a known issue. The ordering does not work for TEI documents. I'm happy to keep the issue open - and I'd be excited if someone else would be willing to pick this up. But, as for myself, I won't have the time to look into it anytime soon.
To explain the issue: for plaintext annotations, sorting is straightforward, since the character offset is part of the annotation anchor. So all we need to sort them by text sequence is there. For TEI, the position of the annotations is denoted by XPath expressions. It might be possible to sensibly sort XPath pointers I guess. But I didn't have time to think the issue through yet. It might well be that the only reliable way to sort TEI annotations is to first compute the character offsets (from the text + XPath). I.e. it's quite a bit more difficult to sort them.
As I said: it would be great if someone would be willing to pick this up. Otherwise I'm afraid it needs to stay open for the foreseeable time.
When exporting to CSV, annotations are currently unordered. That should be changed. (Although it's not a huge issue.) A bigger issue, however, is that the CSV currently doesn't include the filepart info. That means it's not possible to tell apart which annotations are from which file, in case the document has multiple files. Somewhat interesting that no-one has explained about this yet. But, anyways, not difficult to fix.