Curation does not automatically take stacked annotations

toltoxgh commented 3 years ago

Describe the bug I created a custom Layer with sentence level granularity and overlap "Stacking only". I also created a string Tagset and added it as Feature to this custom Layer.

During curation, sentences annotated with a single annotation get automatically put into the top most curation space as suggestions if there is agreement, which is nice.

However, if a sentence is annotated with two or more annotations, no annotation will be put automatically to the top curation selection, even if there is no disagreement between annotators.

I tested this with a project and a single annotator, and even in this case, this behavior above can be seen.

Could this be addressed, so that the stacked annotations, that the annotators agree on, will be automatically put to the curation space above?

Expected behavior Stacked annotations, that the annotators agree on, should be automatically put to the curation space above.

Please complete the following information: I tested this on version 0.17.2

reckart commented 3 years ago

Why does one of your annotators create multiple annotations of the same type with the same labels at the same location?

toltoxgh commented 3 years ago

In our domain, it is reasonable that a sentence can have more than one annotation with the custom Layer.

For example, the Layer could be "Sentence_Topic" and the String Tagset "Topic_Tagset" could be the defined as "Technology, Medicine, Space, Physics, Geology"

The example sentence "Scientists are using computational modeling techniques to investigate molecular interactions for medical research" would be annotated with "Technology" and "Medicine".

In these cases, the curator must click on these annotations during the curation from the annotator, they will be not automatically put on top, even if there was agreement, or even if there was only one annotator for the project. There is also no indication via the red markers on the left that something in this sentence requires attention during curation, which could make curation error prone in this case.

reckart commented 3 years ago

The merge code calculates a "Diff" between the various annotators. This "Diff" consists of "ConfigurationSets" representing the annotations at a certain logical position. Within the "ConfigurationSet", we have a number of "Configurations", each essentially representing a particular feature value (or combination of feature values if the annotation has more than one feature).

Currently, if a user make multiple annotations at the same logical position, then the system interprets that as the annotator being unsure (not agreeing with him/herself) and thus completely discarding that position from its considerations. The "Configuration" even goes so far as to only retain information about a single feature value combination from each annotator.

I believe what you'd be asking for would entail that we retain all the feature value combinations and in case a particular feature combination has been provided by all annotators, then consider it agreement and pre-merge it.

I think we had tried something like this way back, but at least back then abandoned it because it ended up being quite complicated (https://github.com/webanno/webanno/issues/21)...

Also, I believe things may become tricky if the spans are involved in relations or act as slot fillers.

Might be worth giving it another try though...

There are also two alternatives that could be considered:

implementing multi-valued features: instead of having multiple annotations with different labels, have one annotation with multiple labels - also not something to implement in an afternoon though...
having a different layer type for each kind, e.g. a Technology layer and a Medicine layer - that is a bit more inconvenient on the annotation page because it means often switching between layers - but it should work out of the box as desired in terms of curation.

toltoxgh commented 3 years ago

Thanks, your suggestion of defining a distinct layer for each single annotation makes sense, even if it is not as convenient for rapid annotations, because the annotators would need to switch layers all the time.

For a new project, this might be the way to do for us, for the current one, it is too late to change the setup.

If the merge code is too hard to adjust, I think it would already greatly help if during curation, the program would indicate, via the red markers on the left, if an annotator annotated multiple annotations at the same logical position.

That would give a cue to the curator to look at this particular place.

As it is now, it is too easy for a curator to completely miss annotations, because stacked annotations at the same logical position are disregarded as you wrote, and there is no indication that this has happened unless the curator scrolls through every sentence and checks for this, which is cumbersome for a larger corpus. Such an update would help already.

reckart commented 3 years ago

The different coloring will be handled as part of https://github.com/inception-project/inception/pull/2374

GiantEnemyCrab commented 2 years ago

After completing migration from WebAnno 3.3.5 to INCEpTION, I did notice that stacked tags are no longer making it to the curation window. And I realize I filed a similar Github issue in https://github.com/webanno/webanno/issues/1226.

Since there is a lot stacked tags in annotation tasks in my environment, I wanted to come back to this and also share what I have done so far.

Using the REST API, a custom Python script processes document list to see which document has specifiable number of completed annotations (or more). Then, the first annotator document is used as the baseline, and is compared against other annotator(s).

I took the lenient approach. For span-based tags, I only check that all other annotator(s) have the same tag name on the same span. For linked/relation annotation, I check that the same relation triplet (role/relation name, parent tag, child tag) exists in all other annotators. This is also a subtraction approach, because disagreed annotation (by any other) will be removed from the first annotator's document, then saved as "curator"'s document. I first delete disagreed "link/relation" arrows, followed by deleting disagreed tags, then saving the xmi file under a different directory.

Then REST API is used to upload that custom annotation-merged xmi file to INCEpTION as the "CURATION-IN-PROGRESS" document. (I don't think there is a way to upload it as "NEW" curation document via API?). This script runs periodically from crontab. So by the time curator comes, these curations are pre-merged.

At least, this way, I could replicate the behavior of merging annotations mostly the same as how it was done back in WebAnno version 3.3.5, and annotators were fine with the behavior of WebAnno 3.3.5 for curation merge. Back in version 3.3.5, while stacked tags that had the same tag name didn't transfer to curation window, at least stacked tags of different tag names did, and it was sufficient.

So, I think that there could be a checkbox option to allow stacked tags to be regarded not as confused nor self-disagreement.

I also discussed with annotators about the work-around that was proposed in this issue earlier, by using different layers. But this option was not adopted because switching between layers was too tiring to be productive in the kind of annotation spec that has a lot of stacked tags.

In addition, multi-valued annotation was initially done, but this approach didn't last long because it was not straight forward to determine which of the value is the one for relation/linking, and same-tag-name stacking would require other values to be annotated multiple times. So this approach was given up and preferred to do relation linking per single-value-tag.

reckart commented 2 years ago

Have you seen the new merge strategies in INCEpTION?

The most flexible one is the threshold-based one which you can configure for 1) how many users must have made an annotation (e.g. at least three users) and then 2) how strong the vote for the majority label must be compared to the other labels (called confidence threshold).

GiantEnemyCrab commented 2 years ago

I haven't used this yet since the version of INCEpTION being deployed is 20.4. (The AppStream environment offered Firefox 61 as default, and BRAT couldn't be loaded on this old version of Firefox from 2018... so looking to see if custom AppStream image could be built to install more up-to-date browser)

I will keep this in mind! Seems very useful.

This however doesn't solve the issue of stacked tag merging into curation window, but I take your comment as more of information sharing for implementation of user threshold in INCEpTION, which I appreciate.

By the way, it says "%" as the unit but was this meant to be not as percentage? "1%" "0.75%" confidence seem very low.

reckart commented 2 years ago

If you allow stacking on a span layer, then you should also be able to merge different annotations from two annotators and they get stacked.

GiantEnemyCrab commented 2 years ago

Yes, thank you for the screenshots, and manually adding the stacked tags is working fine.

My question is the same as the original reporter of this issue for how "initial curation merge" takes place. (Or when manually re-merging in curation).

If the documents have a lot of stacked tags, then it would be rather time consuming for the curator to go through all the stacked tags of all annotators. For example, if going with THYME spec (http://clear.colorado.edu/compsem/documents/THYME_guidelines.pdf), each concept can have "actual", "hypothetical", "generic", or "hedged" attribute tag on top of the primary concept.

I haven't looked at the code from WebAnno version 3.3.5, which actually had preferred behavior on my end, but looking at latest CAS merge and casdiff codes in INCEpTION (version 23), Position looks to be the "key" part of key-value pairs, which become the basis of calculating annotation differences. So, I am guessing that stacked tags would be not considered as candidates of agreement because of the same Position.

I do understand that logic would be very complex, and I haven't thought of logic that is accommodating of all kinds of stacked tags. I think if the "key" becomes Position AND the tag name, then the situation might be much better on my cases, because staked tags of different names would be fine to consider as agreement in my case.

reckart commented 2 years ago

I don't think there is a way to upload it as "NEW" curation document via API?

If there is a curation document, then the state of the related source document should be "CURATION_IN_PROGRESS". The state of a source document would only be "NEW" if none of the annotators has started annotating yet. Per definition, when curation starts, there must already have been annotators that have annotated something. I think what you might be looking for is a "CURATION_NOT_YET_STARTED" state - that exists and is called "ANNOTATION_FINISHED". "ANNOTATION_FINISHED" means that all annotators have either completed their annotations for the document or the document is not accessible to them and that curation has not yet started.

reckart commented 2 years ago

As you said, the merging approach is based on the Position. Currently, what the merge algorithm seems to do is that for every Position, zero or one Configuration (that's like a tuple of position and feature values) can be chosen by the merge strategy to be merged from the annotators' documents to the curated document.

It would seem that in order to permit the merging of multiple "stacked" annotations from the annotators to the curation document, it would be necessary to change the chooseConfigurationToMerge(...) from

    public Optional<Configuration> chooseConfigurationToMerge(DiffResult aDiff,
            ConfigurationSet aCfgs)

to

    public List<Configuration> chooseConfigurationToMerge(DiffResult aDiff,
            ConfigurationSet aCfgs)

and make it so that an attempt to merge any of the configurations returned by the method is made. Of course returning more than one configuration would only be valid if the target layer actually allowed stacking.

Then, I think e.g. the ThresholdBasedMergeStrategy could be adapted to not only merge the best configuration but possibly the n-best or any configurations that have sufficient support from the annotators.

GiantEnemyCrab commented 2 years ago

I don't think there is a way to upload it as "NEW" curation document via API?

If there is a curation document, then the state of the related source document should be "CURATION_IN_PROGRESS". The state of a source document would only be "NEW" if none of the annotators has started annotating yet. Per definition, when curation starts, there must already have been annotators that have annotated something. I think what you might be looking for is a "CURATION_NOT_YET_STARTED" state - that exists and is called "ANNOTATION_FINISHED". "ANNOTATION_FINISHED" means that all annotators have either completed their annotations for the document or the document is not accessible to them and that curation has not yet started.

Thanks for the response, and ANNOTATION_FINISHED makes sense to me!

reckart commented 2 years ago

When you upload a curation document via the API, then the curation must obviously have started already - so the state that can be set there via the API is only "CURATION_IN_PROGRESS" or "CURATION_FINISHED". It would only be "ANNOTATION_FINISHED" if there would not yet be a curation document - but that of course exists because it was uploaded through the API.

reckart commented 2 years ago

I can see why in your case you would want to upload a curation document but still have it marked as "ANNOTATION_FINISHED" - so that the curators know what they have already touched or not. It's tricky - I am not sure exactly what implications that might bring with it. For example it could be that on project export, the curation document might not be exported in the archive because according to the state, the document is not in curation and should not have a curation document yet...

GiantEnemyCrab commented 2 years ago

I can see why in your case you would want to upload a curation document but still have it marked as "ANNOTATION_FINISHED" - so that the curators know what they have already touched or not. It's tricky - I am not sure exactly what implications that might bring with it. For example it could be that on project export, the curation document might not be exported in the archive because according to the state, the document is not in curation and should not have a curation document yet...

This is rather specific to a case of "bring your own initial curation" and I don't imagine others doing the kind of thing I've been doing. So, I don't think that there is a support that need to be added for this. If there are more options for CAS merge, like stacked tag merging, then I wouldn't be doing annotation merging outside the tool. Then, there is no need for REST API to be able set documents to "ANNOTATION_IN_PROGRESS" either.

reckart commented 2 years ago

@GiantEnemyCrab @toliwa a PR allowing merging of stacked annotations in conjunction with the threshold-based merge strategy has been merged into main - care to test-drive?

GiantEnemyCrab commented 2 years ago

@reckart

Thank you so much for the notice on this! I have tested and I need help on how to make this work, because when I "re-merge", I am still not getting the agreed stacked tags from user1 and user2 in the simple project that is exported:

stacked_tags_proj15278849064284370577.zip

I used threshold merge but I must not be setting things correctly?

reckart commented 2 years ago

What settings fail for you?

These seem to work for me on your project:

GiantEnemyCrab commented 2 years ago

Thank you so much for the guidance! I was confused on Top-voted and I was setting that to 1. After setting that to 2, it worked!

I am going to try more complex cases next and will come back here to post a comment again to report any findings within three days.

reckart commented 2 years ago

I have written a bit of documentation here: https://zoidberg.ukp.informatik.tu-darmstadt.de/jenkins/view/INCEpTION/job/INCEpTION%20main/de.tudarmstadt.ukp.inception.app$inception-app-webapp/doclinks/1/#_merge_incomplete_agreeing_non_stacked_annotations

Note in particular the current section on the threshold based strategy:

Top-voted: when set to 1, only the single best label is pre-merged. If there is a tie on the best label, then nothing is merged. When set to 2 or higher, the respective n best labels are pre-merged. If there is any tie within the n best labels, then all labels that still meet the lowest score of the tie are merged as well. For example, if set to 2 and three annotators voted for label X and another two anotators voted for Y and Z respectively, then Y and Z have a tie at the second rank, so both of them are merged. Note that this setting only affects annotations on layers that allow stacking annotations. For other layers, an implicit setting of 1 is used here.

The tricky thing I think is dealing with the ties. If you say "top results 3", does it mean you get only 3 different labels merged? What if there is a tied between the top 4 or 5 labels? Do we include them or exclude them? What if we say top-results 3 and there is a tie between the first two labels? Do we exclude them but include the third one which does not have a tie? In the previous implementations where only one label was allowed to be merged, ties were always not merged. I have tried preserving that when top-votes is set to 1 but go for a laxer handling of ties with top-votes is 2 or greater.

GiantEnemyCrab commented 2 years ago

I think both approaches are fine and it depends on the use case. It might be yet another config but something like a checkbox for "allow top N ties to be merged". If that's not checked, then ties won't be allowed, and only specified N labels will merge, either randomly or earlier alphabetically ordered label name is selected.

By the way, what do you think of a case like the screenshot below:

Two lorem tags are stacked on the same span with same slot links between user1 and user2. But in curation, it is merged as a single lorem tag. This might be a complex case.

reckart commented 2 years ago

Well, first, lets consider that there is no link - how can you tell which lorem tag is which? The algorithm sees that both users have assigned lorem at least once - it conflates these since no difference between these spans can be seen and merges them up.

As a second step, it looks at the lots. It finds the merged lorem tag and the targets to it.

That said, I'm surprised that one of the "lorem" tags in each of the users is black? I assume they are all from the same layer? Black is normally not a color that should be assigned, so something fishy seems to be happening here...

GiantEnemyCrab commented 2 years ago

Same tag being stacked is really complex, perhaps as long as a pair of parent tag, link/relation, child tag are matched among annotators, it could be merged to curation, however, this can be ignored for now.

And I've attached here the exported project if you are curious of seeing what might be happening further. same_tag_stacked_example2715221680151684197.zip

inception-project / inception

Curation does not automatically take stacked annotations #1893