Closed bsharma5 closed 3 years ago
Inline XML format does not support overlapping annotations. If you have an annotation that crosses a sentence boundary (sentences are also annotations), then you have overlapping annotations. This is why the annotation is not there. The format is not suitable for your case.
You cannot configure the sentence splitting right now.
However, if your plain text input file should be interpreted as "one sentence per line", then you have the option of importing it using the "Plain text (one sentence per line)" format. Then INCEpTION will simply treat each line as a sentence and not look for sentence markers.
Understand now that sentences are also annotations. So how do I export in order to see such cross sentence annotations
On Fri, Oct 16, 2020, 11:37 AM Richard Eckart de Castilho < notifications@github.com> wrote:
Inline XML format does not support overlapping annotations. If you have an annotation that crosses a sentence boundary (sentences are also annotations), then you have overlapping annotations. This is why the annotation is not there. The format is not suitable for your case.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/inception-project/inception/issues/1829#issuecomment-710120907, or unsubscribe https://github.com/notifications/unsubscribe-auth/ARM2XZQRKIYMOSDGWP7NNNLSLBSFHANCNFSM4STPXTJQ .
You can export e.g. as UIMA CAS XMI or WebAnno TSV 3 format.
If you need to post-process the exported data, you might want to go with XMI and have a look at DKPro cassis. It allows you to load the data in a Python script. So you could code yourself a Python script which transforms XMI into any target format you can come up with.
Describe the bug Spans annotated across sentence boundary do not shown in output inlneXML file. Possibly due to sentence splitter issue?
To Reproduce Create a simple Layer of Type Span, and with option "Allow crossing sentence boundaries" selected. Assign a feature to this layer with a set of string values for tagset.
Try annotating the following simple file with two lines:
Do you own a house? Yes Do you commute to work? No
Annotate this span "own a house? Yes" with tag value say "A" In the output file exported as InlineXML, the tag "A" is not present. If I just annotate "own a house", then the tag is present in the output file exported as InlineXML
Expected behavior Looks like this issue is because of the way the sentence is split. Currently, the sentence splitter does splitting on the question mark (?) and since the span crosses the sentence boundary, the output does not show the span.
Note: can the tool be configured to ignore question mark while doing sentence splitting. Or is this a bug that the output does not have the tag.
Screenshots NA
Please complete the following information:
Additional context Add any other context about the problem here.