inception-project / inception

INCEpTION provides a semantic annotation platform offering intelligent annotation assistance and knowledge management.
https://inception-project.github.io
Apache License 2.0
593 stars 151 forks source link

Creating some kind of user interface for annotators (using HTML file), is it possible? #2595

Closed ogozcelik closed 3 years ago

ogozcelik commented 3 years ago

Hello, thank you for that application, it helps a lot for my annotation processes.

My use case of inception: I use inception for text classification. For this purpose I reformat my dataset file as a HTML file since I couldn't find a good way to run this application for text classification.

Alternatives I've considered

Thanks to HTML's flexibilities, I created kind of user interface for annotaters. See the image below: example_1 This is actually a simple HTML code includes texts from my dataset. With this code, I can draw some boxes, creating texts that cannot be clicked, etc.,. This create some user interface for text classification. The basic user interface text: basic_interface.txt

Feature requests: What I ask for is a little bit improved version of this. See picture below: example_2 This is also a HTML code, including image sources, classes etc. , that I run on my local and see this page. When I import this HTML code into inception, I can only see texts. There is no images, backgrounds anything about twitter user interface. See picture below: example_3

The twitter user interface text: tweet_dataset.txt

Question in short: Is there any way to import this kind of HTML code into inception and observe the same page as in local? Or can it be?

Thanks.

reckart commented 3 years ago

The current goal of the HTML importer is that basic formatting can be preserved, e.g. headings, paragraphs, bold, italics, etc. etc. Only the body of the HTML document is actually used - no header information. The goal of the HTML mode was not to provide a way of customizing the UI, although the idea is interesting.

That said, I guess as long as you use no CSS classes (other than maybe bootstrap which INCEpTION itself provides - 20.x uses BS4, 21.x will use BS5) and stick to doing everything with the "style" attribute, you should be ok.

As for images - if you want to link to external images, check what the the development tools in your browser (network panel, console) tell you. I think it should be possible to use img tags or images in style attributes, but you may have to ensure that the server that provides these images sets a COEP header that allows you to reference them.

reckart commented 3 years ago

When you import, which of the HTML formats do you choose? The better one is the one called "HTML" - don't try with the "HTML (legacy)" - that's not going to get you anywhere.

ogozcelik commented 3 years ago

Thank you for your quick respond, as you suggested, I focused on style attributes. Then, I copy all of my CSS classes into style attributes then it surprisingly worked as I wanted. :+1:

I kindly share my solution for the users who want to provide some kind of user interface for their annotation processes. In our case, I try to annotate Tweets. Importing a HTML file, the annotators experience the program as if they are really annotating on Twitter. (The exciting side is that anyone can write an html code of corresponding social media platforms for their purposes, i.e. Facebook, Youtube etc.)

Screenshot from INCEpTION twitter_ui

reckart commented 3 years ago

It is a neat idea, thanks for sharing!

Do you do anything special to prevent users from annotating parts of the "UI" that you do not want them to annotate? E.g. disabling pointer events using styles?

ogozcelik commented 3 years ago

Actually, I am currently working on this. It is possible to prevent users from clicking some parts of the UI. In this case I only make available for clicking just Tweet text, this can be done by adding "user-select:none;" in style attributes. However, when user double click to the text, the unselectable parts are also be selected, which we don't want.

My tricky solution is that for now, adding end of sentence marks such as "!, ?, ." with the same color of background (so that they are visible for user). For example, the top lines of tweets become "user @user!", thus when user double click to tweet text the boundary starts from the end of "!". This is also valid for text part, before importing to the program I remove some punctiations (!, ?, .) inside of the tweet text that mislead the application. Finally the timestamp part is also be processed: instead of 02.09.2021 --> 02.09.2021!. This provides such interface for the annotators as following: twitter_ui

In short, there needs some preprocessing steps carried out before importing to INCEpTION.

Current problem: However, I cannot skip some texts that I don't want to include in dataset. I could not find a solution for that. When I export the annotated document as WebAnno TSV 3.3 (I guess the best way for getting output for text classification), the output contains all texts (also unselectable ones) as following: output

Thus, there needs some processing of the output as well after annotation, like removing user texts, timestamp texts etc.

reckart commented 3 years ago

Yes, post processing is also necessary. But IMHO the TSV format is not very well suited for this post processing. I'd rather suggest to export as XMI and us DKPro Cassis (if you are Python person) to load the data and then extract what you need. You'll find representations of the HTML structure as annotations in the XMI files and could use that to then extract all user-created annotations and text within the this specific areas of the HTML tree.