Open jcklie opened 2 years ago
Actually, that was the idea behind WebAnno TSV - what would be different in your format?
Just support tokens and one layer + one feature with BIO tagging that can be easily generated by ad-hoc scripts.
We're using UIMA CAS XMI (XML 1.1) which allows custom layers which are also re-importable. To us it seemed to be the best clutterless format regarding readability for both users and programs.
For curiosity: how do you read the CAS XMI XML 1.1?
After adding the feature "label" to our custom BIS layer labels of our KB get exported by Inception automatically:
`
`
So we can identify results human-readable and program-readable :).
Maybe that feature should be documented?
For further ideas please see https://github.com/inception-project/inception/issues/3267 which would improve readability.
Sorry, I was unclear: what do you use to machine-read the CAS XMI XML 1.1?
Normally, we would recommend using DKPro Cassis for that purpose, but last time I checked, the XML library that cassis is using did not suppport XML 1.1 (only XML 1.0).
Thank you for the hint, we are primarily a PHP shop so we would start off using PHP XML for importing.
So far we are generating annotated CAS XMI XML 1.1 and importing into Inception for annotation review. This works nicely also using labels.
Is your feature request related to a problem? Please describe. It is a bit surprising for many that Conll-2003 needs the exact POS, NER, CHUNK annotations and does not support custom layers.
Describe the solution you'd like Add a very simple conll-style format importer.
Issue then is how to export in that format again.