proycon / flat

FoLiA Linguistic Annotation Tool -- Flat is a web-based linguistic annotation environment based around the FoLiA format (http://proycon.github.io/folia), a rich XML-based format for linguistic annotation. Flat allows users to view annotated FoLiA documents and enrich these documents with new annotations, a wide variety of linguistic annotation types is supported through the FoLiA paradigm.
GNU General Public License v3.0
110 stars 15 forks source link

Question: text class values when assigning multiple annotation labels #178

Closed pirolen closed 3 years ago

pirolen commented 3 years ago

Admittedly, I am not fully aware of all the requirements for an optimal configuration of an annotation scheme.

Here are my labels: https://github.com/pirolen/folia-resources/blob/main/bibkat_entities.foliaset.xml

Attached are screenshots of the GUI and the error message.

Screenshot 2021-08-02 at 11 54 28 Screenshot 2021-08-02 at 11 55 50

I am going to test if I can assign the two labels if the original text class if other than 'current'.

Can it also be that actually the way the text is structured (div > par > list > listItem) restricts what can happen during annotation? (This is btw not an ideal structuring, I'd like to represent this data as Entries, but could not get Entries rendered as such so this was a workaround test.)

And is it valid to perform in a single step the correction of a string, as well as its labeling?

pirolen commented 3 years ago

The assignment works fine when the text class is 'OCR', and the text is structured as Entry. Screenshot 2021-08-06 at 19 39 18

proycon commented 3 years ago

If multiple labels need to be given to partly or fully overlapping spans of words (i.e., nested entity annotations), how are class values supposed to function? Shall the uploaded documents have a class different from 'current'? (E.g. 'OCR')

You should indeed be able to annotate multiple overlapping spans of words (technically they're not nested annotation, they're all specified at the same level and there's no hierarchy). The text class is indeed most often simply 'current', the entity class of course comes from your own vocabulary.

Because I am getting an error when my original text class is 'current' and wanting to tag the same span of words with multiple labels (e.g. 'Alwicus' is both Author and Lemma, please see screenshot), that it is not possible that multiple structural elements are assigned to the 'current' class.

Did you use the New drop-down box and the + button to add a new entity, after having added the first one? (Rather than editing the entity every time). That's the proper way to add multiple overlapping entities. I guess you did because in the second scenario it worked fine? If so then there might be a bug I need to look into. The text class (OCR or current) shouldn't really be a major factor in this. That error you got is rather odd as that query isn't adding/modifying any text content...

pirolen commented 3 years ago

Did you use the New drop-down box and the + button to add a new entity, after having added the first one? (Rather than editing the entity every time). That's the proper way to add multiple overlapping entities.

It might be that I was trying to do several things at the same time, e.g. correcting while adding a new label. Because when I now try to open this same document from the Document Index, I get this error:

"Fatal Error: The document server returned an error

HTTP Error 404: Not Found

FoLiA error in pirolen/FA-MBK-4-3_035245008_0030_abpproc: [ParseError] FoLiA exception in handling of @ line 109 (in parent @ parent line 108) : [DuplicateAnnotationError] Can not add multiple text content elements with the same class (current) to the same structural element!

Query was: DECLARE correction OF https://raw.githubusercontent.com/pirolen/folia-resources/main/bibkat_ocr_corrections.foliaset.xml"

proycon commented 3 years ago

It looks like the document got corrupted in some earlier stage (which of course is a bug and shouldn't happen). Can you send the document so I can try to reproduce it?

pirolen commented 3 years ago

Please find attached. I probably gave FLAT a hard time with my annotation tests and several config modifications (inluding changing the tagset itself) at the same time :-( FA-MBK-4-3_035245008_0030_abpproc.folia.xml.txt

proycon commented 3 years ago

Something went wrong here:

                <w xml:id="FA-MBK-4-3_035245008_0030_abpproc.text.div.2.p.1.list.1.item.1.p.1.s.2.w.1" set="tokconfig-deu" class="WORD">
                  <t set="https://raw.githubusercontent.com/proycon/folia/master/setdefinitions/text.foliaset.ttl" datetime="2021-07-31T17:50:59">ecclesie</t>
                  <t set="https://raw.githubusercontent.com/proycon/folia/master/setdefinitions/text.foliaset.ttl">ecclesie</t>
                </w>

An extra text element got added it seems, which is not allowed if it's in the same text class as the other. I'll check the log you provided earlier to see what edit caused it, and find a fix. This is definitely a bug, you probably get the errors you're reporting in the original issue because something went wrong earlier. Probably any edit now will give this error.

pirolen commented 3 years ago

Yes, it seemed (also for other files that got corrupted) that whatever I do afterwards will give an error. Admittedly, I was making edits and corrections in large amounts on one and the same element, e.g. because sometimes I missed what span was highlighted and then wrongly assigned the tag, or kept changing the tagset, or tagged and corrected the orthography at the same time (is that possible at all)?.

proycon commented 3 years ago

This one was the culprit:

ADD t OF https://raw.githubusercontent.com/proycon/folia/master/setdefinitions/text.foliaset.ttl WITH text "ecclesie" datetime now confidence NONE FOR ID FA-MBK-4-3_035245008_0030_abpproc.text.div.2.p.1.list.1.item.1.p.1.s.2.w.1

The query failed though (as it should):

2021-07-31 17:50:59 - [QUERY FAILED] FoLiA Error in pirolen/FA-MBK-4-3_035245008_0030_abpproc: [DuplicateAnnotationError] Can not add multiple text content elements with the same class (current) to the same structural element!
  File "/home/flatuser/flateditor/env/lib/python3.7/site-packages/foliadocserve/foliadocserve.py", line 661, in query
    result =  query(doc,False,self.debug >= 2)
  File "/home/flatuser/flateditor/env/lib/python3.7/site-packages/folia/fql.py", line 2135, in __call__
    focusselection, targetselection = self.action(self, targetselector, debug) #selecting focus elements further constrains the target selection (if any), return values will be lists
  File "/home/flatuser/flateditor/env/lib/python3.7/site-packages/folia/fql.py", line 1798, in __call__
    focusselection.append( target.add(action.focus.Class, **action.assignments) ) #handles span annotation too
  File "/home/flatuser/flateditor/env/lib/python3.7/site-packages/folia/main.py", line 2368, in add
    return self.append(child,*args,**kwargs)
  File "/home/flatuser/flateditor/env/lib/python3.7/site-packages/folia/main.py", line 4072, in append
    e = super(AbstractStructureElement,self).append(child, *args, **kwargs)
  File "/home/flatuser/flateditor/env/lib/python3.7/site-packages/folia/main.py", line 2205, in append
    if dopostappend: child.postappend()
  File "/home/flatuser/flateditor/env/lib/python3.7/site-packages/folia/main.py", line 4608, in postappend
    raise DuplicateAnnotationError("Can not add multiple text content elements with the same class (" + c.cls + ") to the same structural element!")

I wonder why it ended up in the document after all then

or tagged and corrected the orthography at the same time (is that possible at all)?.

Yep, you can do multiple annotations in one go.

And whatever you do, the tool should be robust against corrupting the document of course, so I consider this a good stress test :)

proycon commented 3 years ago

FoLiApy v2.5.5 is released now and should provide more safeguards against corrupted documented, which is what led to this issue.

You'll still have to fix the corrupted document manually for now though (remove the duplicate line 109).

pirolen commented 3 years ago

I see, thanks. Admittedly, I am not sure about the functionalities of the drop-down list 'New' in the lower part of the dialog window. Probably I have not used it much so far, but rather used the 'N' button (above, in the 'Text' area) plus the Entity drop-down to add new annotations :-o which worked out, but possible was not meant to? So:

Screenshot 2021-08-10 at 11 35 11
proycon commented 3 years ago

What is the difference between using the 'N' button to add new annotations, as opposed to the drop-down list?

You can use the 'N' button but it only applies to the annotation type the buttons correspond with! In the above screenshot, you can not use "N" to add a new entity, but only for new text context! (Ideally there should be an N button in the entity row as well)

The "New" field additionally allows adding annotation types that are not present yet.

When would one want to add a new string, or a new text element?

Only when you have multiple text layers in your document, I don't think this is something you'll want to do much in practice. The correction facility (or a direct edit) is usually more appropriate.

pirolen commented 3 years ago

You can use the 'N' button but it only applies to the annotation type the buttons correspond with!

What is the annotation type the the buttons in the screenshot correspond with?

In the above screenshot, you can not use "N" to add a new entity, but only for new text context! (Ideally there should be an N button in the entity row as well)

I would be indebted if you could perhaps troubleshoot my config file in this respect... (also the entry slices issue maybe? Admittedly, this is offtopic).

The "New" field additionally allows adding annotation types that are not present yet. Could you perhaps give an example?

Much thanks!

proycon commented 3 years ago

What is the annotation type the the buttons in the screenshot correspond with?

The ones in the same row, so in your screenshot the "N" button corresponds to Text Annotation, not Entity Annotation.

I would be indebted if you could perhaps troubleshoot my config file in this respect... (also the entry slices issue maybe? Admittedly, this is offtopic).

The configuration only allows for enabling or disabling certain edit forms for all annotation types, it doesn't allow setting per annotation type, so it should be fine as it is. (but I can take a look for the slice issue of course)

The "New" field additionally allows adding annotation types that are not present yet. Could you perhaps give an example?

For example, in the screenshot you gave, you could add the "Language" annotation on that particular word, it would add a new row to the editor.

pirolen commented 3 years ago

Thanks!

Sorry for the troubles, but now I updated FLAT and (regrettably again!) forgot to log out of the editor and stop the FLAT webservice... (multi-tasking...) Now there is the following problem: Aug 10 12:23:23 badwver-itservice2 startFoliaServer.sh[13625]: CherryPy Checker: Aug 10 12:23:23 badwver-itservice2 startFoliaServer.sh[13625]: The Application mounted at '' has an empty config

Which config is being missed and where should it be located? (I tried some things but they did not work...)

proycon commented 3 years ago

That's foliadocserve reporting, but it's a notice you can simply ignore, not an error.

pirolen commented 3 years ago

I see -- but there is an Internal Server error/no FLAT, so am kind of confused what went wrong :-(

proycon commented 3 years ago

You're running the production service through uwsgi right? Did you restart that one? Does the uwsgi log say anything?

pirolen commented 3 years ago

Yes, restarted uwsgi several times, also restarted nginx. The uwsgi log's last part:

Aug 10 13:33:50 badwver-itservice2 startUwsgi.sh[34709]: uwsgi socket 0 bound to UNIX address dflat.sock fd 3 Aug 10 13:33:50 badwver-itservice2 startUwsgi.sh[34709]: Python version: 3.7.3 (default, Jan 22 2021, 20:04:44) [GCC 8.3.0] Aug 10 13:33:50 badwver-itservice2 startUwsgi.sh[34709]: Python threads support is disabled. You can enable it with --enable-threads Aug 10 13:33:50 badwver-itservice2 startUwsgi.sh[34709]: Python main interpreter initialized at 0x56220e979df0 Aug 10 13:33:50 badwver-itservice2 startUwsgi.sh[34709]: your server socket listen backlog is limited to 100 connections Aug 10 13:33:50 badwver-itservice2 startUwsgi.sh[34709]: your mercy for graceful operations on workers is 60 seconds Aug 10 13:33:50 badwver-itservice2 startUwsgi.sh[34709]: mapped 437424 bytes (427 KB) for 5 cores Aug 10 13:33:50 badwver-itservice2 startUwsgi.sh[34709]: Operational MODE: preforking Aug 10 13:33:50 badwver-itservice2 startUwsgi.sh[34709]: unable to load app 0 (mountpoint='') (callable not found or import error) Aug 10 13:33:50 badwver-itservice2 startUwsgi.sh[34709]: no app loaded. going in full dynamic mode Aug 10 13:33:50 badwver-itservice2 startUwsgi.sh[34709]: uWSGI is running in multiple interpreter mode Aug 10 13:33:50 badwver-itservice2 startUwsgi.sh[34709]: spawned uWSGI master process (pid: 34711) Aug 10

pirolen commented 3 years ago

But such 'Internal Server Error' messages come also simply when something is (formally) wrong in the config file, or even when the XML of the annotation sets is malformed. It can be that I accidentally made a change in the config file, will try to check this.

proycon commented 3 years ago

Yes, it can't even import flat so something must be wrong at a fairly early stage. Errors in the flat config could or your python environment (virtualenv) cause this.

pirolen commented 3 years ago

Fixed it now: the wsgi.py file was accidentally commented out :-o Thanks for your support! The slices issue would be nice to remedy still.