qurator-spk / neat

Named entity annotation tool
Apache License 2.0
27 stars 5 forks source link

support token merges/splits #8

Closed cneud closed 4 years ago

cneud commented 4 years ago

Sometimes it is necessary to either merge or split tokens, e.g. due to segmentation errors.

There are 2 basic operations that should be supported here:

a) merge: concatenates the text content of the current row with that of the above row, deletes the row and updates the offsets

b) split: adds a new row below with the text content of the current row copied there and updates the offsets

cneud commented 4 years ago

added in 1dd5acd66f7ef9e0bced41f81eabc8ac30f2508a

snmnzl commented 4 years ago

First testing sessions show that OCR correction would require two more options for more efficent handling:

c.) add : the same function as b.) split but without a copy of the text content from the above row.

d.) delete : deletes the current row with its text content and updates the offsets.

Both can be worked around by function a.) and b.), but this takes a lot of time.