inception-project / inception

INCEpTION provides a semantic annotation platform offering intelligent annotation assistance and knowledge management.
https://inception-project.github.io
Apache License 2.0
593 stars 151 forks source link

Downloading text of annotation sequences with labels #1844

Closed zacyapjq closed 3 years ago

zacyapjq commented 3 years ago

Instead of any of the export formats provided in Inception, is there a way I could export the raw text sequences I labelled, along with their label names?

The use case I have here is in creating a dataset where I label the sentences in my text as text sequences (sentence boundary detection is not well-solved for my domain), along with a custom tag for each sentence.

An ideal export format would be, on each line of a txt file: text label

The data format I am hoping to achieve is similar to what you see here: https://github.com/Law-AI/semantic-segmentation/blob/master/data/text/1953_L_1.txt

jcklie commented 3 years ago

You can export XMI and format it like you want, e.g. using DKPro cassis [1]. That should be pretty simple for your use case.

https://github.com/dkpro/dkpro-cassis

reckart commented 3 years ago

I have provided an example as a Python notebook linked on our website:

https://inception-project.github.io/example-projects/python/

You can open it directly on Google Colab and try it out.