inception-project / inception

INCEpTION provides a semantic annotation platform offering intelligent annotation assistance and knowledge management.
https://inception-project.github.io
Apache License 2.0
596 stars 152 forks source link

Annotation screen goes blank after selecting a numeric text span #2030

Closed david-waterworth closed 2 years ago

david-waterworth commented 3 years ago

I'm trying to annotate text such as TEF 4-3 Status using the NER layer. TEF is class equip, 4-3 is class id and Status is class point.

I may have messed up the file, I'm using WebAnno TSV 3.2. The issue is whenever I select tokens 4-3 and select value="id" the pane showing the document contents goes blank. There doesn't seem to be any errors in the log, and the UI is still responsive - I can export the document and it's applied the annotation but I cannot continue annotation.

My guess is it has something to do with the fact that the entity mention is text (or perhaps a numeric expression even) but it's possible I've also incorrectly generated the file (I've attached a sample).

This is an example after I reload

image

as soon as I click on identifier (I changed the value from id to identifier in case id was reserved)

image

I've attached a test file. The file loads into INCEpTION, and also https://pypi.org/project/web-anno-tsv/ so I'm assuming I've created it correctly but I don't really know.

test.tsv.zip

reckart commented 3 years ago

That sounds like the visualization crashes during rendering. Which browser are you using and which version of INCEpTION?

david-waterworth commented 3 years ago

Chrome and inception-app-standalone-0.17.2.jar

I did notice there's a newer release (inception-app-webapp-0.18.1-standalone.jar) but it won't run on my system due to a requirement for a newer java version (I'm using openjdk version "1.8.0_282") so tomorrow I'll update and see if that helps.

reckart commented 3 years ago

Ok. If it doesn't, please check in the Web Developer Tools of Chrome if you might see any errors in the Console view.


https://developers.google.com/web/tools/chrome-devtools/shortcuts

david-waterworth commented 3 years ago

With Java 1.11 and inception 1.8.0.282 I get the error below in the console view - it seems you're correct. I guess it still could be an issue with my file so I'll look at it a little closer

visualizer-ver-6FE6C5C69300B2B09BCF3952BBD91BFE.js:3452 Rendering terminated due to: IndexSizeError: Failed to execute 'getEndPositionOfChar' on 'SVGTextContentElement': The charnum provided (0) is greater than or equal to the maximum bound (0). Error: Failed to execute 'getEndPositionOfChar' on 'SVGTextContentElement': The charnum provided (0) is greater than or equal to the maximum bound (0). at http://localhost:8080/wicket/resource/de.tudarmstadt.ukp.clarin.webanno.brat.resource.BratVisualizerResourceReference/visualizer-ver-6FE6C5C69300B2B09BCF3952BBD91BFE.js:1356:8 at Fragment. (http://localhost:8080/wicket/resource/de.tudarmstadt.ukp.clarin.webanno.brat.resource.BratVisualizerResourceReference/visualizer-ver-6FE6C5C69300B2B09BCF3952BBD91BFE.js:1175:1) at Function.each (http://localhost:8080/wicket/resource/org.apache.wicket.resource.JQueryResourceReference/jquery/jquery-2.2.4-ver-F9EE266EF993962AD59E804AD9DEBE66.js:2:2861) at SVGTextElement. (http://localhost:8080/wicket/resource/de.tudarmstadt.ukp.clarin.webanno.brat.resource.BratVisualizerResourceReference/visualizer-ver-6FE6C5C69300B2B09BCF3952BBD91BFE.js:1174:3) at Function.each (http://localhost:8080/wicket/resource/org.apache.wicket.resource.JQueryResourceReference/jquery/jquery-2.2.4-ver-F9EE266EF993962AD59E804AD9DEBE66.js:2:2861) at a.fn.init.each (http://localhost:8080/wicket/resource/org.apache.wicket.resource.JQueryResourceReference/jquery/jquery-2.2.4-ver-F9EE266EF993962AD59E804AD9DEBE66.js:2:845) at getTextMeasurements (http://localhost:8080/wicket/resource/de.tudarmstadt.ukp.clarin.webanno.brat.resource.BratVisualizerResourceReference/visualizer-ver-6FE6C5C69300B2B09BCF3952BBD91BFE.js:1170:34) at getTextAndSpanTextMeasurements (http://localhost:8080/wicket/resource/de.tudarmstadt.ukp.clarin.webanno.brat.resource.BratVisualizerResourceReference/visualizer-ver-6FE6C5C69300B2B09BCF3952BBD91BFE.js:1198:17) at renderDataReal (http://localhost:8080/wicket/resource/de.tudarmstadt.ukp.clarin.webanno.brat.resource.BratVisualizerResourceReference/visualizer-ver-6FE6C5C69300B2B09BCF3952BBD91BFE.js:1528:13) at rerender (http://localhost:8080/wicket/resource/de.tudarmstadt.ukp.clarin.webanno.brat.resource.BratVisualizerResourceReference/visualizer-ver-6FE6C5C69300B2B09BCF3952BBD91BFE.js:3447:1)

There's also a warning appearing in my bash shell, not sure it that's related

2021-02-25 07:27:12 WARN [admin] ResourceReferenceRegistry - A ResourceReference wont be created for a resource with key [scope: de.tudarmstadt.ukp.inception.support.vue.VueBehavior; name: vue3-sfc-loader.js.map; locale: null; style: null; variation: null] because it cannot be located.

reckart commented 3 years ago

It could also be a bug in the rendering code - we had a similar issue on Firefox recently: https://github.com/inception-project/inception/issues/1849

reckart commented 3 years ago

The warning in the bash log you can ignore. That happens when Chrome is trying to load the source map file which we do not have for the vue3-sfc-loader.

david-waterworth commented 3 years ago

What I've found works is to remove the whitespace from my input file i.e.

Text=EXF-LD 1_1

1-1 0-3 EXF 1-2 3-4 - 1-3 4-6 LD 1-4 6-7 1-5 7-8 1 1-6 8-9 1-7 9-10 1 _

Becomes (removed token 1-4)

Text=EXF-LD 1_1

1-1 0-3 EXF 1-2 3-4 - 1-3 4-6 LD 1-4 7-8 1 1-5 8-9 1-6 9-10 1 _

I also read that certain chars need to be escaped (https://webanno.github.io/webanno/releases/3.4.5/docs/user-guide.html#_reserved_characters) but when I attempted this (i.e. token 1-5) the file won't load. It's not clear to me if I'm supposed to escape both the text (EXF-LD 11 -> EXF-LD 1_1) and/or the token (1-5 -> _) and does that mean I have to update the offsets. But it actually seems to be working better with no escaping, just removal of whitespace tokens.

reckart commented 3 years ago

INCEpTION (and WebAnno for that matter) do not support tokens that start or end (or consist exclusively) of whitespace.

david-waterworth commented 3 years ago

OK That's probably my issue then. Still not 100% sure how to properly escape the text but what I've done (not escape anything) does seem to be working.

reckart commented 3 years ago

Can you make an example where you thing something might need to be escaped?

david-waterworth commented 3 years ago

I've attached an example, token 1-5 is an underscore. It's in the list of reserved characters \,[,],|,_,->,;,\t,\n,* if I'm reading correctly.

I have many examples which contain _ or | (and some which are \

Text=EXF-LD 1_1

1-1 0-3 EXF 1-2 3-4 - 1-3 4-6 LD 1-4 7-8 1 1-5 8-9 1-6 9-10 1 _

test.tsv.zip

reckart commented 3 years ago

I have dropped the text "_ * \n \t [ ]" (mind that \n is a linebreak and \t is a tab) into the Tsv3XSerializer and this is what it produces:

#Text=_ * 
#Text= \t [ ]
1-1 0-1 _   
1-2 2-3 *   
1-3 8-9 [   
1-4 10-11   ]   

So the tab is escaped in the #Text line. The line break is realized as an actual line break starting a new #Text line. In the token lines, the characters are not escaped. It looks like in the text, only tab, line feed, form feed, carriage return, backspace and backslash are escaped. However, in the annotation columns, the special characters mentioned in the documentation are escaped.

reckart commented 3 years ago

Hm, yes, the escaping is only in the #Text lines - but in the text column (i.e. third column), there is no escaping at all:

        // Write unit text
        aOut.print(doc.getJCas().getDocumentText().substring(aUnit.getBegin(), aUnit.getEnd()));
        aOut.printf(FIELD_SEPARATOR);

I guess this is something to be changed in a future iteration of the format and to be properly documented.

david-waterworth commented 3 years ago

Thanks. I don't think [ and ] are reserved characters. By my reading of the documentation I linked to above, (comma) is and since they're displaying the reserved characters in a comma-separated list the actually reserved comma is enclosed to [ ] to try and remove confusion (but I didn't originally notice so it didn't work).

So I think the list of reserved characters is supposed to be

\,|_;* \t and \n (along with the -> character?)

But as we've seen Tsv3XSerializer (is this what you use to parse the files) doesn't escape most of these characters.

reckart commented 3 years ago

The Tsv3XSerializer is part of the code that does the serialization - as is the function below from the Java class Escaping which handles the escaping of the feature column values. Here you can see exactly what is escaped in which way.

    public static String escapeValue(String aValue)
    {
        return StringUtils.replaceEach(aValue,
                new String[] { "\\", "[", "]", "|", "_", "->", ";", "\t", "\n", "*" },
                new String[] { "\\\\", "\\[", "\\]", "\\|", "\\_", "\\->", "\\;", "\\t", "\\n",
                        "\\*" });
    }
reckart commented 2 years ago

Assuming the that issue was resolved as there was no more feedback and thus closing the issue.