inception-project / inception

INCEpTION provides a semantic annotation platform offering intelligent annotation assistance and knowledge management.
https://inception-project.github.io
Apache License 2.0
593 stars 151 forks source link

Allow import of Preannotated Spacy Doc files #3722

Closed jonnyfoka closed 1 year ago

jonnyfoka commented 1 year ago

Is your feature request related to a problem? Please describe.

For a relation classification task I have annotated several news like text documents with prodigy annotation software. Prodigy outputs the format in a JSONL file that can be converted into a .spacy file. In the JSONL format, each line represents one news article with its annotations.

Now I want to convert my annotations into a more standardized format like CONLL, so that I can work with my annotations with Inception (Prodigy has not been a good choice). Unfortunatly, I haven't found any lib, script or tool that can convert prodigy Jsonl/Spacy to CONLL.

Describe the solution you'd like Allow to import pre-annotated documents with .spacy format.

Describe alternatives you've considered Pre-annotated .spacy Doc converter (e.g. for CONLL-U)

jcklie commented 1 year ago

No, I think I would use https://pypi.org/project/spacy-conll/ or something like that if I understand your use case correctly or if you want to use INCEpTION, use https://github.com/dkpro/dkpro-cassis to convert it to XMI, which is the best format for INCEpTION. Some examples are in the notebooks at the bottom of https://github.com/inception-project/inception#see-our-documentation-for-further-reading .

reckart commented 1 year ago

Note that if you want to map your data to custom annotation layers in INCEpTION, then you should use UIMA CAS XMI (dkpro-cassis) - CONLL-U will only help if you stick to the handful of layers that CONLL-U / INCEpTION's CONLL-U-reader/writer support.

reckart commented 1 year ago

Looks like there is a version of this question also on SO:

https://stackoverflow.com/questions/75070901/convert-prodigy-jsonl-spacy-doc-format-to-conll

reckart commented 1 year ago

Closing as there was no further feedback.