Closed 2xXLunAXx2 closed 1 year ago
I think those are basically the two options. But there may be details to make them more attractive / viable.
For option 1 where you upload all the files, you could pre-process them in such a way that you externally annotate the sentences which you want the users to annotate in the tool. Then, you could give the annotators the instruction to use the search sidebar functionality to find the sentences and annotate them. In this way, you would not need to be hunting for annotations manually.
For option 2 where you pre-process the data to extract the relevant sentences to place them into new (batch) documents, you should also create a dedicated annotation layer to capture the source of the data (e.g. original filename, begin/end). That layer does not need to be visible to the annotators - it only needs to be in the project settings. Once your annotators are done, you can use this source information to map the annotations back to the original data.
In both cases, DKPro Cassis can help you loading / manipulating / saving your data in the UIMA CAS XMI or UIMA CAS JSON formats.
Advice provided, I hope it helps.
Hi, We have a tagging dilema, and we could use your advice. We tag a transcription text according to audio files (of about an hour each). Up until now, we uploaded files into Inception, tagged the entire text in the file (while hearing the audio file), and exported them. However, now we need to only tag 1-2 sentences in a file (they appear randomly, but we find them based on their indices). We were wondering what would be the best way to do that, because we have hundreds of these files.
Many thanks for you help!