pelagios / recogito2

Semantic Annotation Without the Pointy Brackets
Apache License 2.0
153 stars 30 forks source link

CSV order and parts #568

Closed rsimon closed 6 years ago

rsimon commented 6 years ago

When exporting to CSV, annotations are currently unordered. That should be changed. (Although it's not a huge issue.) A bigger issue, however, is that the CSV currently doesn't include the filepart info. That means it's not possible to tell apart which annotations are from which file, in case the document has multiple files. Somewhat interesting that no-one has explained about this yet. But, anyways, not difficult to fix.

ChiaraPalladino commented 6 years ago

Sorry to reopen this, but I am experiencing the problem of unordered annotations with files imported from TEI XML. I am not sure about the filepart since i've been dealing with small texts so far.

rsimon commented 6 years ago

Hi Chiara, yes indeed that's a known issue. The ordering does not work for TEI documents. I'm happy to keep the issue open - and I'd be excited if someone else would be willing to pick this up. But, as for myself, I won't have the time to look into it anytime soon.

To explain the issue: for plaintext annotations, sorting is straightforward, since the character offset is part of the annotation anchor. So all we need to sort them by text sequence is there. For TEI, the position of the annotations is denoted by XPath expressions. It might be possible to sensibly sort XPath pointers I guess. But I didn't have time to think the issue through yet. It might well be that the only reliable way to sort TEI annotations is to first compute the character offsets (from the text + XPath). I.e. it's quite a bit more difficult to sort them.

As I said: it would be great if someone would be willing to pick this up. Otherwise I'm afraid it needs to stay open for the foreseeable time.