inception-project / inception

INCEpTION provides a semantic annotation platform offering intelligent annotation assistance and knowledge management.
https://inception-project.github.io
Apache License 2.0
587 stars 149 forks source link

Export document in HTML format #3267

Open jfiala opened 2 years ago

jfiala commented 2 years ago

Is your feature request related to a problem? Please describe. Currently exported data is only usable for technical users. It would be nice to have a format usable for both technical and non-technical users.

Describe the solution you'd like The document should be exportable as an HTML file. The result should be visible to non-inception users and still be usable for programs (e.g. using data-... attributes). So the HTML file could be re-imported and is still visible to users not using inception/brat.

Describe alternatives you've considered Apache CAS XMI export appears to be clean and readable for both users/programs. However, it is not easily possible to view the annotated document (without inception).

There is an XMI viewer, but it is also java based and the last release has been 4 years ago (2018). http://nilsreiter.github.io/SimpleXmiViewer/

So for users not wanting to install something locally there is no chance to view the results (without inception).

reckart commented 2 years ago

There are many ways that annotated texts could be rendered. For this reason, INCEpTION supports different annotation editors and also different export formats targeted at different use-cases. It is not clear to me what the particular use-case would be here. In particular, just stating that the export should be an HTML file does not say how the annotated data should actually be visually represented.

I think there might also be two things mixed up here that do not necessarily need to go together, namely being able to capture the state of a visualization (e.g. adding some kind of "print" functionality" to existing editors) and the import/export of full annotated data. Why not keep these aspects separate?

jfiala commented 2 years ago

That's true, in fact an CAS XMI file carries all information necessary to render the annotations.

Probably the best solution would be to use a configurable CAS XMI viewer which renders the XML in HTML-format for any browsers? The viewer could also make the XMI readable using screenreaders which is currently IMHO hard using SVG only...?

reckart commented 2 years ago

UIMA CAS XMI files and the associated type system descriptions only provide basic information. The layer and feature configurations in INCEpTION provide a lot of additional information that is important to decide how to render annotations. For example, the concept of a "relation" that INCEpTION renders as an arrow with a label does not exist as such at the level of XMI files. That is why I believe a kind of "visual export" would probably happen at the level of a particular editor and would essentially just create a snapshot of what that editor has drawn into the browser - capturing whatever approach the editor takes for drawing and taking into account whatever configuration applies to that particular editor.

jfiala commented 2 years ago

Do you think it makes sense to have a lightweight html viewer for Apache CAS XMI? (for now without relation support)

reckart commented 2 years ago

For some people that may make sense. "Lightweight" is a pretty broad term. Would it show only annotated spans? What about the features? What about feature structures that are used in feature values but not annotations themselves. How about overlapping/stacking annotations? There are many questions. Depending on how far one would want to go, it would be necessary to implement a full XMI "deserializer" in JavaScript that is able to resolve feature structure references and handle different types of feature values as well as being able to understand a typesystem.xml (what the DKPro Cassis deserlizer does). If the answer to many of the questions is "no", you might get away with a pretty simple viewer though.

That said, there is a new UIMA CAS JSON format which may be a better basis for building lightweight viewers than the XMI format.

jfiala commented 2 years ago

Inception supports UIMA CAS Json for export, but unfortunately not for import. That was why we are using the XML based format. Are you planning to support UIMA CAS Json for import as well?

reckart commented 2 years ago

The CAS JSON format supported by INCEpTION atm is an old format which does not contain sufficient information for a full deserialization - it is one-way.

It is planned to add support for the new format when a release of it is available.

jfiala commented 2 years ago

We could support annotated spans + sentence annotations as well as overlapping/stacking annotations without problem in a WAI compliant way. We haven't used relations yet, that would be a nice extension then. We'd have to look how to render this nicely in HTML (without SVG).

reckart commented 2 years ago

You could have a look e.g. at https://github.com/inception-project/inception-annotatorjs-editor-plugin - maybe something can be derived from it.

reckart commented 2 years ago

The INCEpTION editor plugins operate on a very simple "visual" model consisting of spans and arcs. Normally, the CAS data is transformed to that model by the INCEpTION backend. If you implement a pure JS converter from CAS data to the "visual" model, you should basically be able to use any of the editor plugins as viewers.

jfiala commented 2 years ago

For example, the concept of a "relation" that INCEpTION renders as an arrow with a label does not exist as such at the level of XMI files.

I just took a look at the XMI export of relations/dependencies, and they are currently supported by the Inception Apache UIMA CAS 1.1 export:

<pos:POS xmi:id="3992" sofa="1" begin="310" end="331"/>
    <pos:POS xmi:id="4038" sofa="1" begin="347" end="356"/>
    <dependency:Dependency xmi:id="4084" sofa="1" begin="347" end="356" Governor="668" Dependent="720"/>

However, in the UIMA References it seems that e.g. Dependency is not specified at all? https://uima.apache.org/d/uimaj-current/references.html#ugr.ref.xmi

reckart commented 2 years ago

The core UIMA framework has no concept of relations, dependencies, etc. It leaves the definition of the annotation schema mostly to the user.

INCEpTION uses for its built-in layers types from the type system from DKPro Core: https://dkpro.github.io/dkpro-core/releases/2.2.0/docs/typesystem-reference.html

The custom layer types are defined by the the INCEpTION user.

jfiala commented 2 years ago

Thank you for the explanation! The modelling of the XMI export is really nice!

reckart commented 2 years ago

Have you had a look at the new JSON export in v24?