qurator-spk / neat

Named entity annotation tool
Apache License 2.0
27 stars 5 forks source link

lines are being cut off (after saving?) #38

Closed snmnzl closed 4 years ago

snmnzl commented 4 years ago

We noticed that ending parts of the plain text (OCR) of the newspaper page in the example.tsv were missing after saving the annotation result of each session (see different states of the file attached). We couldn't figure out the pattern yet - our best guess is, that its connected to the split/merge function. example.tsv_states.zip

labusch commented 4 years ago

There seems to be a problem with lines breaks. For instance in 1Annotation.tsv at line 45, there is a line break that is not supposed to exist.

snmnzl commented 4 years ago

Right, this happens regularly in all our files and we did not find a source for this error. In neath, the display of the lines is correct.

cneud commented 4 years ago

fixed in a1bd8afc9dd764c87c24992143632b9aa11b3d07

snmnzl commented 4 years ago

@cneud Unfortunately, we still notice different line numbers after editing. The unaltered original file 27646518_1892-07-05_21_335_005.tsv has 4887 lines. After editing (see file attached), my file is left with 4811 lines. The last three tokens are not the same as in the original. 27646518_1892-07-05_21_335_005_edit.zip

cneud commented 4 years ago

@snmnzl Does this issue still occur with the current version of neat?

JZinck commented 4 years ago

@cneud We still encounter this problem on a regular basis: the last files I edited were complete in NEAT and were saved. By the time they were either uploaded or reopened they had lines missing.

I attached two files to demonstrate: first one is the first draft of the master data (ending in _M) which when opened in NEAT ends correctly at L 2351. A change has been made to a set of tokens (5 tags were removed) and the data has been saved (file ending in _MM) and the resulting file ends at L 2346. Creating_Masterdata_2436020X_1897-05-07_0_212_002.zip

cneud commented 4 years ago

@JZinck Ouch, too bad this apparently still occurs. Thanks for the extra information and example data. We will investigate the cause of the error.

Just to confirm, the file was only ever opened in neat and not in any other editor/tool?

JZinck commented 4 years ago

@cneud Correct. We have been very careful to only open them in neat.

labusch commented 4 years ago

I finally found the cause for that one. The problem did only occur if you would edit the tags via the accordion menu. It would not show up if you perform the editing by means of the hot keys.

The root cause was a missing trim() on a string that resulted in invalid line breaks being inserted into the tsv file.

snmnzl commented 4 years ago

@labusch Sounds great, thx! We will get back to you in case it still occurs, since we are constantly using the accordeon menu it would show up quickly. But for now we take this as solved!