inception-project / inception

INCEpTION provides a semantic annotation platform offering intelligent annotation assistance and knowledge management.
https://inception-project.github.io
Apache License 2.0
589 stars 149 forks source link

Stable IDs of exported features #4960

Open tpluscode opened 2 months ago

tpluscode commented 2 months ago

I noticed that when exporting in CAS RDF format, the features are identified by has URLs like <doc:example.pdf#48603>.

Unfortunately, adding or removing annotations and links appears to change these numbers.

Would it be possible to also export some stable identifier that is assigned to an annotation and does not change?

reckart commented 2 months ago

It would require first adding some kind of persistent ID management to INCEpTION itself. There are currently not persistent IDs generated internally, so none can be exported. INCEpTION can import IDs on tokens and sentences and reproduce them again during export, but there are no such IDs fields on other annotation types. Also, INCEpTION has no mechanism of managing such IDs, e.g. to avoid duplicates, etc.

There are IDs used internally, but they are not stable for a long time. The ID space is compacted every time you open a document. Also, these IDs are an implementation detail of UIMA that is not (fully) under the control of INCEpTION.

tpluscode commented 2 months ago

Well, on second thought, the document text is immutable and so I can use the combination of sofa start/end of annotations as stable identifiers. At least for non-overlapping layers.