Closed pirolen closed 3 years ago
The assignment works fine when the text class is 'OCR', and the text is structured as Entry.
If multiple labels need to be given to partly or fully overlapping spans of words (i.e., nested entity annotations), how are class values supposed to function? Shall the uploaded documents have a class different from 'current'? (E.g. 'OCR')
You should indeed be able to annotate multiple overlapping spans of words (technically they're not nested annotation, they're all specified at the same level and there's no hierarchy). The text class is indeed most often simply 'current', the entity class of course comes from your own vocabulary.
Because I am getting an error when my original text class is 'current' and wanting to tag the same span of words with multiple labels (e.g. 'Alwicus' is both Author and Lemma, please see screenshot), that it is not possible that multiple structural elements are assigned to the 'current' class.
Did you use the New drop-down box and the +
button to add a new entity, after having added the first one? (Rather than editing the entity every time). That's the proper way to add multiple overlapping entities. I guess you did because in the second scenario it worked fine? If so then there might be a bug I need to look into. The text class (OCR or current) shouldn't really be a major factor in this. That error you got is rather odd as that query isn't adding/modifying any text content...
Did you use the New drop-down box and the
+
button to add a new entity, after having added the first one? (Rather than editing the entity every time). That's the proper way to add multiple overlapping entities.
It might be that I was trying to do several things at the same time, e.g. correcting while adding a new label. Because when I now try to open this same document from the Document Index, I get this error:
"Fatal Error: The document server returned an error
HTTP Error 404: Not Found
FoLiA error in pirolen/FA-MBK-4-3_035245008_0030_abpproc: [ParseError] FoLiA exception in handling of @ line 109 (in parent @ parent line 108) : [DuplicateAnnotationError] Can not add multiple text content elements with the same class (current) to the same structural element!
Query was: DECLARE correction OF https://raw.githubusercontent.com/pirolen/folia-resources/main/bibkat_ocr_corrections.foliaset.xml"
It looks like the document got corrupted in some earlier stage (which of course is a bug and shouldn't happen). Can you send the document so I can try to reproduce it?
Please find attached. I probably gave FLAT a hard time with my annotation tests and several config modifications (inluding changing the tagset itself) at the same time :-( FA-MBK-4-3_035245008_0030_abpproc.folia.xml.txt
Something went wrong here:
<w xml:id="FA-MBK-4-3_035245008_0030_abpproc.text.div.2.p.1.list.1.item.1.p.1.s.2.w.1" set="tokconfig-deu" class="WORD">
<t set="https://raw.githubusercontent.com/proycon/folia/master/setdefinitions/text.foliaset.ttl" datetime="2021-07-31T17:50:59">ecclesie</t>
<t set="https://raw.githubusercontent.com/proycon/folia/master/setdefinitions/text.foliaset.ttl">ecclesie</t>
</w>
An extra text element got added it seems, which is not allowed if it's in the same text class as the other. I'll check the log you provided earlier to see what edit caused it, and find a fix. This is definitely a bug, you probably get the errors you're reporting in the original issue because something went wrong earlier. Probably any edit now will give this error.
Yes, it seemed (also for other files that got corrupted) that whatever I do afterwards will give an error. Admittedly, I was making edits and corrections in large amounts on one and the same element, e.g. because sometimes I missed what span was highlighted and then wrongly assigned the tag, or kept changing the tagset, or tagged and corrected the orthography at the same time (is that possible at all)?.
This one was the culprit:
ADD t OF https://raw.githubusercontent.com/proycon/folia/master/setdefinitions/text.foliaset.ttl WITH text "ecclesie" datetime now confidence NONE FOR ID FA-MBK-4-3_035245008_0030_abpproc.text.div.2.p.1.list.1.item.1.p.1.s.2.w.1
The query failed though (as it should):
2021-07-31 17:50:59 - [QUERY FAILED] FoLiA Error in pirolen/FA-MBK-4-3_035245008_0030_abpproc: [DuplicateAnnotationError] Can not add multiple text content elements with the same class (current) to the same structural element!
File "/home/flatuser/flateditor/env/lib/python3.7/site-packages/foliadocserve/foliadocserve.py", line 661, in query
result = query(doc,False,self.debug >= 2)
File "/home/flatuser/flateditor/env/lib/python3.7/site-packages/folia/fql.py", line 2135, in __call__
focusselection, targetselection = self.action(self, targetselector, debug) #selecting focus elements further constrains the target selection (if any), return values will be lists
File "/home/flatuser/flateditor/env/lib/python3.7/site-packages/folia/fql.py", line 1798, in __call__
focusselection.append( target.add(action.focus.Class, **action.assignments) ) #handles span annotation too
File "/home/flatuser/flateditor/env/lib/python3.7/site-packages/folia/main.py", line 2368, in add
return self.append(child,*args,**kwargs)
File "/home/flatuser/flateditor/env/lib/python3.7/site-packages/folia/main.py", line 4072, in append
e = super(AbstractStructureElement,self).append(child, *args, **kwargs)
File "/home/flatuser/flateditor/env/lib/python3.7/site-packages/folia/main.py", line 2205, in append
if dopostappend: child.postappend()
File "/home/flatuser/flateditor/env/lib/python3.7/site-packages/folia/main.py", line 4608, in postappend
raise DuplicateAnnotationError("Can not add multiple text content elements with the same class (" + c.cls + ") to the same structural element!")
I wonder why it ended up in the document after all then
or tagged and corrected the orthography at the same time (is that possible at all)?.
Yep, you can do multiple annotations in one go.
And whatever you do, the tool should be robust against corrupting the document of course, so I consider this a good stress test :)
FoLiApy v2.5.5 is released now and should provide more safeguards against corrupted documented, which is what led to this issue.
You'll still have to fix the corrupted document manually for now though (remove the duplicate line 109).
I see, thanks. Admittedly, I am not sure about the functionalities of the drop-down list 'New' in the lower part of the dialog window. Probably I have not used it much so far, but rather used the 'N' button (above, in the 'Text' area) plus the Entity drop-down to add new annotations :-o which worked out, but possible was not meant to? So:
What is the difference between using the 'N' button to add new annotations, as opposed to the drop-down list?
You can use the 'N' button but it only applies to the annotation type the buttons correspond with! In the above screenshot, you can not use "N" to add a new entity, but only for new text context! (Ideally there should be an N button in the entity row as well)
The "New" field additionally allows adding annotation types that are not present yet.
When would one want to add a new string, or a new text element?
Only when you have multiple text layers in your document, I don't think this is something you'll want to do much in practice. The correction facility (or a direct edit) is usually more appropriate.
You can use the 'N' button but it only applies to the annotation type the buttons correspond with!
What is the annotation type the the buttons in the screenshot correspond with?
In the above screenshot, you can not use "N" to add a new entity, but only for new text context! (Ideally there should be an N button in the entity row as well)
I would be indebted if you could perhaps troubleshoot my config file in this respect... (also the entry slices issue maybe? Admittedly, this is offtopic).
The "New" field additionally allows adding annotation types that are not present yet. Could you perhaps give an example?
Much thanks!
What is the annotation type the the buttons in the screenshot correspond with?
The ones in the same row, so in your screenshot the "N" button corresponds to Text Annotation, not Entity Annotation.
I would be indebted if you could perhaps troubleshoot my config file in this respect... (also the entry slices issue maybe? Admittedly, this is offtopic).
The configuration only allows for enabling or disabling certain edit forms for all annotation types, it doesn't allow setting per annotation type, so it should be fine as it is. (but I can take a look for the slice issue of course)
The "New" field additionally allows adding annotation types that are not present yet. Could you perhaps give an example?
For example, in the screenshot you gave, you could add the "Language" annotation on that particular word, it would add a new row to the editor.
Thanks!
Sorry for the troubles, but now I updated FLAT and (regrettably again!) forgot to log out of the editor and stop the FLAT webservice... (multi-tasking...) Now there is the following problem: Aug 10 12:23:23 badwver-itservice2 startFoliaServer.sh[13625]: CherryPy Checker: Aug 10 12:23:23 badwver-itservice2 startFoliaServer.sh[13625]: The Application mounted at '' has an empty config
Which config is being missed and where should it be located? (I tried some things but they did not work...)
That's foliadocserve reporting, but it's a notice you can simply ignore, not an error.
I see -- but there is an Internal Server error/no FLAT, so am kind of confused what went wrong :-(
You're running the production service through uwsgi right? Did you restart that one? Does the uwsgi log say anything?
Yes, restarted uwsgi several times, also restarted nginx. The uwsgi log's last part:
Aug 10 13:33:50 badwver-itservice2 startUwsgi.sh[34709]: uwsgi socket 0 bound to UNIX address dflat.sock fd 3 Aug 10 13:33:50 badwver-itservice2 startUwsgi.sh[34709]: Python version: 3.7.3 (default, Jan 22 2021, 20:04:44) [GCC 8.3.0] Aug 10 13:33:50 badwver-itservice2 startUwsgi.sh[34709]: Python threads support is disabled. You can enable it with --enable-threads Aug 10 13:33:50 badwver-itservice2 startUwsgi.sh[34709]: Python main interpreter initialized at 0x56220e979df0 Aug 10 13:33:50 badwver-itservice2 startUwsgi.sh[34709]: your server socket listen backlog is limited to 100 connections Aug 10 13:33:50 badwver-itservice2 startUwsgi.sh[34709]: your mercy for graceful operations on workers is 60 seconds Aug 10 13:33:50 badwver-itservice2 startUwsgi.sh[34709]: mapped 437424 bytes (427 KB) for 5 cores Aug 10 13:33:50 badwver-itservice2 startUwsgi.sh[34709]: Operational MODE: preforking Aug 10 13:33:50 badwver-itservice2 startUwsgi.sh[34709]: unable to load app 0 (mountpoint='') (callable not found or import error) Aug 10 13:33:50 badwver-itservice2 startUwsgi.sh[34709]: no app loaded. going in full dynamic mode Aug 10 13:33:50 badwver-itservice2 startUwsgi.sh[34709]: uWSGI is running in multiple interpreter mode Aug 10 13:33:50 badwver-itservice2 startUwsgi.sh[34709]: spawned uWSGI master process (pid: 34711) Aug 10
But such 'Internal Server Error' messages come also simply when something is (formally) wrong in the config file, or even when the XML of the annotation sets is malformed. It can be that I accidentally made a change in the config file, will try to check this.
Yes, it can't even import flat so something must be wrong at a fairly early stage. Errors in the flat config could or your python environment (virtualenv) cause this.
Fixed it now: the wsgi.py file was accidentally commented out :-o Thanks for your support! The slices issue would be nice to remedy still.
Admittedly, I am not fully aware of all the requirements for an optimal configuration of an annotation scheme.
Here are my labels: https://github.com/pirolen/folia-resources/blob/main/bibkat_entities.foliaset.xml
Attached are screenshots of the GUI and the error message.
I am going to test if I can assign the two labels if the original text class if other than 'current'.
Can it also be that actually the way the text is structured (div > par > list > listItem) restricts what can happen during annotation? (This is btw not an ideal structuring, I'd like to represent this data as Entries, but could not get Entries rendered as such so this was a workaround test.)
And is it valid to perform in a single step the correction of a string, as well as its labeling?