Closed jonnyfoka closed 1 year ago
No, I think I would use https://pypi.org/project/spacy-conll/ or something like that if I understand your use case correctly or if you want to use INCEpTION, use https://github.com/dkpro/dkpro-cassis to convert it to XMI, which is the best format for INCEpTION. Some examples are in the notebooks at the bottom of https://github.com/inception-project/inception#see-our-documentation-for-further-reading .
Note that if you want to map your data to custom annotation layers in INCEpTION, then you should use UIMA CAS XMI (dkpro-cassis) - CONLL-U will only help if you stick to the handful of layers that CONLL-U / INCEpTION's CONLL-U-reader/writer support.
Looks like there is a version of this question also on SO:
https://stackoverflow.com/questions/75070901/convert-prodigy-jsonl-spacy-doc-format-to-conll
Closing as there was no further feedback.
Is your feature request related to a problem? Please describe.
For a relation classification task I have annotated several news like text documents with prodigy annotation software. Prodigy outputs the format in a JSONL file that can be converted into a .spacy file. In the JSONL format, each line represents one news article with its annotations.
Now I want to convert my annotations into a more standardized format like CONLL, so that I can work with my annotations with Inception (Prodigy has not been a good choice). Unfortunatly, I haven't found any lib, script or tool that can convert prodigy Jsonl/Spacy to CONLL.
Describe the solution you'd like Allow to import pre-annotated documents with .spacy format.
Describe alternatives you've considered Pre-annotated .spacy Doc converter (e.g. for CONLL-U)