inception-project / inception

INCEpTION provides a semantic annotation platform offering intelligent annotation assistance and knowledge management.
https://inception-project.github.io
Apache License 2.0
593 stars 151 forks source link

An advice- tagging many files #4142

Closed 2xXLunAXx2 closed 1 year ago

2xXLunAXx2 commented 1 year ago

Hi, We have a tagging dilema, and we could use your advice. We tag a transcription text according to audio files (of about an hour each). Up until now, we uploaded files into Inception, tagged the entire text in the file (while hearing the audio file), and exported them. However, now we need to only tag 1-2 sentences in a file (they appear randomly, but we find them based on their indices). We were wondering what would be the best way to do that, because we have hundreds of these files.

  1. Should we upload each file seperately? In this case, we'll have hundreds of files and it will be harder to find anything.
  2. Should we find the sentences we want to tag and merge them into one file? The problem here is that we need to know somehow the name of the file which each sentence came from, because we need to hear the corresponding audio. Is there a way to mention it in Inception?
  3. Any other options you can think of?

Many thanks for you help!

reckart commented 1 year ago

I think those are basically the two options. But there may be details to make them more attractive / viable.

For option 1 where you upload all the files, you could pre-process them in such a way that you externally annotate the sentences which you want the users to annotate in the tool. Then, you could give the annotators the instruction to use the search sidebar functionality to find the sentences and annotate them. In this way, you would not need to be hunting for annotations manually.

For option 2 where you pre-process the data to extract the relevant sentences to place them into new (batch) documents, you should also create a dedicated annotation layer to capture the source of the data (e.g. original filename, begin/end). That layer does not need to be visible to the annotators - it only needs to be in the project settings. Once your annotators are done, you can use this source information to map the annotations back to the original data.

In both cases, DKPro Cassis can help you loading / manipulating / saving your data in the UIMA CAS XMI or UIMA CAS JSON formats.

reckart commented 1 year ago

Advice provided, I hope it helps.