inception-project / inception

INCEpTION provides a semantic annotation platform offering intelligent annotation assistance and knowledge management.
https://inception-project.github.io
Apache License 2.0
593 stars 151 forks source link

InlineXML output file does not contain tag for spans across sentences. #1829

Closed bsharma5 closed 3 years ago

bsharma5 commented 4 years ago

Describe the bug Spans annotated across sentence boundary do not shown in output inlneXML file. Possibly due to sentence splitter issue?

To Reproduce Create a simple Layer of Type Span, and with option "Allow crossing sentence boundaries" selected. Assign a feature to this layer with a set of string values for tagset.

Try annotating the following simple file with two lines:

Do you own a house? Yes Do you commute to work? No

Annotate this span "own a house? Yes" with tag value say "A" In the output file exported as InlineXML, the tag "A" is not present. If I just annotate "own a house", then the tag is present in the output file exported as InlineXML

Expected behavior Looks like this issue is because of the way the sentence is split. Currently, the sentence splitter does splitting on the question mark (?) and since the span crosses the sentence boundary, the output does not show the span.

Note: can the tool be configured to ignore question mark while doing sentence splitting. Or is this a bug that the output does not have the tag.

Screenshots NA

Please complete the following information:

Additional context Add any other context about the problem here.

reckart commented 4 years ago

Inline XML format does not support overlapping annotations. If you have an annotation that crosses a sentence boundary (sentences are also annotations), then you have overlapping annotations. This is why the annotation is not there. The format is not suitable for your case.

reckart commented 4 years ago

You cannot configure the sentence splitting right now.

However, if your plain text input file should be interpreted as "one sentence per line", then you have the option of importing it using the "Plain text (one sentence per line)" format. Then INCEpTION will simply treat each line as a sentence and not look for sentence markers.

bsharma5 commented 4 years ago

Understand now that sentences are also annotations. So how do I export in order to see such cross sentence annotations

On Fri, Oct 16, 2020, 11:37 AM Richard Eckart de Castilho < notifications@github.com> wrote:

Inline XML format does not support overlapping annotations. If you have an annotation that crosses a sentence boundary (sentences are also annotations), then you have overlapping annotations. This is why the annotation is not there. The format is not suitable for your case.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/inception-project/inception/issues/1829#issuecomment-710120907, or unsubscribe https://github.com/notifications/unsubscribe-auth/ARM2XZQRKIYMOSDGWP7NNNLSLBSFHANCNFSM4STPXTJQ .

reckart commented 4 years ago

You can export e.g. as UIMA CAS XMI or WebAnno TSV 3 format.

reckart commented 4 years ago

If you need to post-process the exported data, you might want to go with XMI and have a look at DKPro cassis. It allows you to load the data in a Python script. So you could code yourself a Python script which transforms XMI into any target format you can come up with.