inception-project / inception

INCEpTION provides a semantic annotation platform offering intelligent annotation assistance and knowledge management.
https://inception-project.github.io
Apache License 2.0
600 stars 153 forks source link

Add simple conll-like import file format #2820

Open jcklie opened 2 years ago

jcklie commented 2 years ago

Is your feature request related to a problem? Please describe. It is a bit surprising for many that Conll-2003 needs the exact POS, NER, CHUNK annotations and does not support custom layers.

Describe the solution you'd like Add a very simple conll-style format importer.

Issue then is how to export in that format again.

reckart commented 2 years ago

Actually, that was the idea behind WebAnno TSV - what would be different in your format?

jcklie commented 2 years ago

Just support tokens and one layer + one feature with BIO tagging that can be easily generated by ad-hoc scripts.

jfiala commented 2 years ago

We're using UIMA CAS XMI (XML 1.1) which allows custom layers which are also re-importable. To us it seemed to be the best clutterless format regarding readability for both users and programs.

reckart commented 2 years ago

For curiosity: how do you read the CAS XMI XML 1.1?

jfiala commented 2 years ago

After adding the feature "label" to our custom BIS layer labels of our KB get exported by Inception automatically:

`

`

So we can identify results human-readable and program-readable :).

Maybe that feature should be documented?

For further ideas please see https://github.com/inception-project/inception/issues/3267 which would improve readability.

reckart commented 2 years ago

Sorry, I was unclear: what do you use to machine-read the CAS XMI XML 1.1?

Normally, we would recommend using DKPro Cassis for that purpose, but last time I checked, the XML library that cassis is using did not suppport XML 1.1 (only XML 1.0).

jfiala commented 2 years ago

Thank you for the hint, we are primarily a PHP shop so we would start off using PHP XML for importing.

So far we are generating annotated CAS XMI XML 1.1 and importing into Inception for annotation review. This works nicely also using labels.