inception-project / inception

INCEpTION provides a semantic annotation platform offering intelligent annotation assistance and knowledge management.
https://inception-project.github.io
Apache License 2.0
593 stars 151 forks source link

Remote API should fail when unsupported format is requested #2070

Closed sandimschuh closed 3 years ago

sandimschuh commented 3 years ago

Describe the bug When I request an annotation in HTML format via the remote API I get the annotation in the WebAnno TSV format instead of HTML.

To Reproduce Steps to reproduce the behavior:

  1. Enable the remote API in the settings.properties file
  2. Open the Swagger UI
  3. Select the AERO API
  4. Execute a GET request for endpoint /api/aero/v1/projects/{projectId}/documents/{documentId}/annotations/{userId} with format set to HTML. E.g. http://localhost:8080/api/aero/v1/projects/0/documents/3/annotations/admin?format=HTML
  5. Klick on the "Download file" link.
  6. Open the downloaded file and verify it is not a HTML file

Expected behavior The downloaded file should be a HTML file and not a TSV file.

Screenshots The Swagger UI image

The downloaded file image

Please complete the following information:

reckart commented 3 years ago

We do not support HTML export, so it falls back to the default format. So I guess you'd prefer an error in that case instead of returning another format than requested?

sandimschuh commented 3 years ago

Actually, HTML would be a good way to display the annotation results for a document (outside INCEpTION). Would it be a great deal to implement HTML writer? Or is there any other format that's good for visualizing the annotations?

reckart commented 3 years ago

For importing HTML files, we use the DKPro Core HtmlDocumentReader. The counterpart for writing files for this is XmlDocumentWriter.

The HtmlDocumentReader takes the XML/HTML structure of the HTML file being read and represents it as annotations (annotations not visible in INCEpTION). The XmlDocumentWriter takes the annotation-encoded XML/HTML structure and writes it back into an XML (HTML) file. In this process, only the text and the XML/HTML annotations are considered. Other annotations are completely ignored because there is no guarantee that these could be aligned with the strictly hierarchical XML/HTML structure.

If you wanted to export a visualization, there would in theory be the BratWriter but I believe you would find it very surprising and strange because the output it produces is nothing similar to the brat visualization you see in INCEpTION. This is because INCEpTION has access to additional on formation on layers and features which the DKPro Core BratWriter does not have because the additional information is part of INCEpTION and not of UIMA.

A possible idea might be to take the XmlDocumentWriter and enhance it into a HtmlDocumentWriter which would store annotations e.g. as W3C microdata while also embeddding some JavaScript to interpret that microdata and render it as highlights over the text.