inception-project / inception

INCEpTION provides a semantic annotation platform offering intelligent annotation assistance and knowledge management.
https://inception-project.github.io
Apache License 2.0
593 stars 151 forks source link

Is there way to correcting text and annotating at the same time? #2242

Closed whisere closed 3 years ago

whisere commented 3 years ago

Is your feature request related to a problem? Please describe. At the moment it seems only annotation can be done on the text (or is there way to correct text too). It would be good to be able to correct text from OCR etc sources at the same time as annotation.

Describe the solution you'd like correcting/changing text content and annotating text at the same time.

Or is there other tool that can do this? Thanks!

reckart commented 3 years ago

The approach to take currently in INCEptION is to treat corrections as annotations as well. Consider that the tools is a multi-user tool where multiple annotators typically annotate the same text in parallel - they can't really go about and change the text all the time. By treating corrections as annotations, you can compare the corrections between users and adjudicate them. Actually applying the corrections to the text can be done as a post-host external step. One way of doing it is to use the built-in "Orthography" layer which should be compatible with the DKPro Core SofaChangeAnnotation and ApplyChangesAnnotator.

Of course the question about fixing token and sentence boundaries may still remain. We have started a very experimental branch allowing to display and change token and sentence annotations in the editor, but it is not yet usable. So right now that means that if you work with very dirty text, you may have to define your own custom layers as "character-level" layers and allow crossing sentence boundaries so you have maximum flexibility. Built-in recommenders are still bound to sentences/tokens - with external ones you should be more flexible and be able to ignore them if necessary.

whisere commented 3 years ago

Thank you!! If to use the character-level annotation and crossing sentence boundaries mentioned above (I will have a try Thanks!) to correct text and remote recommender to annotate text at the same time, sentence sentiment annotation for example, how do we export both corrected text and annotation into one file, is it possible to have a example like this here: https://colab.research.google.com/github/inception-project/inception-project.github.io/blob/master/_example-projects/python/INCEpTION_Annotations_as_one_sentence_and_label_per_line.ipynb or advice on how to do it? Thanks!

reckart commented 3 years ago

If you follow the approach explained above and want the external recommender to take the changes into account, you'd have to implement your external recommender such that the external recommender interprets your change annotations and applies them to the text before training. The external recommender gets all the annotation data, so it can also access any change annotations you make. So it should be possible, but it is not straightly the most convenient and smooth procedure (yet).

whisere commented 3 years ago

Thank you! I have successfully exported the annotations from external recommender (remote sentiment classifier) to file by following the link above, but I am not sure how to also include the corrected text from Orthography layer in the same file (I have managed to use the Orthography layer with "character-level" layers and allow crossing sentence boundaries to correct text. Thank you!), I don't need the external recommender to interpret the corrected text, just need to export both the corrected text and annotation in the same file. Thanks!

whisere commented 3 years ago

Eg current export: annotation sentence positive It looks lively.

to (after correcting text with Orthography layer lively->lovely ) annotation sentence positive It looks lovely

reckart commented 3 years ago

Assuming you annotated lively as an Orthograph annotation with the operation replace and the correction lovely.

Then you'd need to implement a piece of code which looks at this annotation and recognizes that lively needs to be replaced with lovely. As menioned before, the Java-based DKPro Core ApplyChangesAnnotator can do that, but if you are in a Python environment, you'd at the moment have to build something yourself - we have no ready-made Python implementation of the ApplyChangesAnnotator.

whisere commented 3 years ago

Thanks! Can we use something similar like in the link https://colab.research.google.com/github/inception-project/inception-project.github.io/blob/master/_example-projects/python/INCEpTION_Annotations_as_one_sentence_and_label_per_line.ipynb with the exported UIMA CAS XMI 1.0 file, I am using the built in Orthography layer in Inception. I have something like this in the exported file:

<type8:SofaChangeAnnotation xmi:id="7056" sofa="1" begin="103" end="114" value="description" operation="replace"/>
<type8:SofaChangeAnnotation xmi:id="7063" sofa="1" begin="126" end="160" operation="delete"/>
<type8:SofaChangeAnnotation xmi:id="7077" sofa="1" begin="600" end="601" value="s"/>
<type8:SofaChangeAnnotation xmi:id="7091" sofa="1" begin="601" end="642" operation="delete"/>
<type8:SofaChangeAnnotation xmi:id="7084" sofa="1" begin="602" end="607"/>
<type8:SofaChangeAnnotation xmi:id="7098" sofa="1" begin="643" end="678" operation="delete"/>
<type8:SofaChangeAnnotation xmi:id="7105" sofa="1" begin="701" end="702" operation="delete"/>
<type8:SofaChangeAnnotation xmi:id="7070" sofa="1" begin="703" end="708"/>

<custom2:Sentiment xmi:id="6924" sofa="1" begin="93" end="94" Polarity="positive"/>
<custom2:Sentiment xmi:id="6929" sofa="1" begin="95" end="125" Polarity="neutral"/>
<custom2:Sentiment xmi:id="6934" sofa="1" begin="126" end="251" Polarity="neutral"/>
whisere commented 3 years ago

It looks promising with the exported UIMA CAS XMI 1.0 file, I will investigate further next week.

whisere commented 3 years ago

I had a try at extending from the sentiment annotation code to include the Orthograph annotations, it can't do cross sentence correction or have more than one correction under each sentiment (can use one of replace, delete, insert_before or insert_after under each sentiment/sentence) but it will work for now:

#!pip install dkpro-cassis > /dev/null

from cassis import *

with open('TypeSystem.xml', 'rb') as f:
  typesystem = load_typesystem(f)

cas = Cas(typesystem=typesystem)

with open('data.xmi', 'rb') as f:
  doc = load_cas_from_xmi(f, typesystem=typesystem)

#print(doc.sofa_string)

# Since Sentiment is a sentence-level annotation in INCEpTION, we get
# one annotation per sentence. So we can simply iterate over the 
# Sentiment annotations and write its polarity and the covered text
# to the output file
sentiments=[]
corrections=[]
with open('annotation-per-line.txt', 'w') as f:
  for sentiment in doc.select('webanno.custom.Sentiment'):
    f.write(f"{sentiment.begin}\t{sentiment.end}\t{sentiment.Polarity}\t{sentiment.get_covered_text()}\n")
    if(sentiment.Polarity=='positive' or sentiment.Polarity=='negative'):
        sentiments.append(sentiment)
  for correction in doc.select('de.tudarmstadt.ukp.dkpro.core.api.transform.type.SofaChangeAnnotation'):
    f.write(f"{correction.begin}\t{correction.end}\t{correction.operation}\t{correction.value}\t{correction.get_covered_text()}\n")
    corrections.append(correction)
f.close()

doclen = len(doc.sofa_string)
alltext = "$" * doclen

with open('sentiment-sentence-per-line.txt', 'w') as f:
    for sentiment in sentiments:
        #only output positvie negative sentiment
        if(sentiment.Polarity=='positive' or sentiment.Polarity=='negative'):
            alltext = "$" * doclen
            alltext=alltext[:sentiment.begin] + sentiment.get_covered_text() + alltext[sentiment.end:]
            sizechange_all=0;
            sentiment_changedBegin=sentiment.begin
            sentiment_changedEnd=sentiment.end
            for correction in corrections:
                #sentiment text change size
                # check if correction is inside of sentiment text, make sure correction is not across either side of the sentiment text but inside
                # can only have one of these below in each sentiment/sentence, eg can't have delete and replace at the same time in one sentence, just use delete, replace, insert_after or insert_before once
                if((correction.begin>=sentiment.begin and correction.begin<=sentiment.end) and (correction.end>=sentiment.begin and correction.end<=sentiment.end)):

                    if(correction.operation=='delete'):
                        alltext=alltext[:correction.begin] + alltext[correction.end:]
                        sizechange = -(correction.end-correction.begin)
                    elif(correction.operation=='replace'):
                        alltext=alltext[:correction.begin] + correction.value + alltext[correction.end:]
                        sizechange = len(correction.value)-(correction.end-correction.begin)
                    elif(correction.operation=='insert_after'):
                        insertPos = correction.end
                        alltext= alltext[:insertPos]+ correction.value + alltext[insertPos:]
                        sizechange = len(correction.value)
                    elif(correction.operation=='insert_before'):
                        insertPos = correction.begin
                        alltext= alltext[:insertPos]+ correction.value + alltext[insertPos:]
                        sizechange = len(correction.value)
                    elif(correction.operation==None and correction.value!=None):
                        alltext=alltext[:correction.begin] + correction.value + alltext[correction.end:]
                        sizechange = len(correction.value)-(correction.end-correction.begin)
                    elif(correction.operation==None and correction.value==None):
                        alltext=alltext[:correction.begin] + alltext[correction.end:]
                        sizechange = -(correction.end-correction.begin)
                    sizechange_all=sizechange_all+sizechange
            sentiment_changedEnd = sentiment.end+sizechange_all
            #print (str(sentiment.begin)+":"+str(sentiment.end+sizechange_all))
            sentiment_changedText = alltext[sentiment_changedBegin:sentiment_changedEnd]
            #print if not empty or only contains space
            if((sentiment_changedText or sentiment_changedText.strip())):
                print(sentiment.Polarity+":"+sentiment_changedText)
                f.write(f"{sentiment.Polarity}\t{sentiment_changedText}\n")
f.close()