Open tpluscode opened 3 months ago
It would require first adding some kind of persistent ID management to INCEpTION itself. There are currently not persistent IDs generated internally, so none can be exported. INCEpTION can import IDs on tokens and sentences and reproduce them again during export, but there are no such IDs fields on other annotation types. Also, INCEpTION has no mechanism of managing such IDs, e.g. to avoid duplicates, etc.
There are IDs used internally, but they are not stable for a long time. The ID space is compacted every time you open a document. Also, these IDs are an implementation detail of UIMA that is not (fully) under the control of INCEpTION.
Well, on second thought, the document text is immutable and so I can use the combination of sofa start/end of annotations as stable identifiers. At least for non-overlapping layers.
I noticed that when exporting in CAS RDF format, the features are identified by has URLs like
<doc:example.pdf#48603>
.Unfortunately, adding or removing annotations and links appears to change these numbers.
Would it be possible to also export some stable identifier that is assigned to an annotation and does not change?