default idr workflows fail in outreach situation

pwalczysko commented 4 years ago

Run a workflow such as the one for idr0021 in a server where the data to be annotated are duplicated within a read-annotate group (each set of data belongs to a different user, but the users are in the same group and the data are identical in all other respects, except the differing IDs of objects from user to user).
Use the default csv file and the default bulkmap file from https://github.com/IDR/idr0021-lawo-pericentriolarmaterial/tree/master/experimentA
Run the workflow sequentially, user after user
Observe success with the first user
Observe success with creation of the bulk annotation step on the second user, but
Observe failure with the second user on the bulkmap step, see below

[importer1@ome-training-1 in-place-import]$ omero metadata populate --context bulkmap --cfg idr0021-scripts/idr0021-experimentA-bulkmap-config.yml --batch 100 Project:976

...
Exception: Duplicate MapAnnotation primary key: id:119824 ns:openmicroscopy.org/mapr/organism primary:('openmicroscopy.org/mapr/organism', frozenset({('Organism', 'Homo sapiens')})) keyvalues:[('Organism', 'Homo sapiens')] parents:set() id:119824

Note that this issue is solved once the "Advanced features" section of the bulkmap config is deleted.

Not sure how important this issue is for IDR, but for outreach, this is a blocker for overtaking the bulkmap yaml files from IDR repos as they are.

cc @manics @joshmoore @sbesson @francesw

pwalczysko commented 4 years ago

Sorry, edited the errror, now it makes sense hopefully

sbesson commented 4 years ago

If primary_group_keys are defined in the YML configuration files, map annotations for the given namespaces are queried and duplicate key/value pairs are raising an error. The exception is raised by this section of the code

https://github.com/ome/omero-py/blob/337018ec348805443a13210295d5f7e7b11923d7/src/omero/util/metadata_mapannotations.py#L217-L223

The bulk2map workflow was primarily designed in the context of the IDR i.e. where all data belonging to a single user in a single group. My suspicion from the issue description is that the error above reflects the lack of support for the multi-user scenario. It might be useful to know how many Homo sapiens annotations of the given namespace exist in the system and who owns them when the error is thrown.

Moving forward, we need to discuss and agree on the expected behavior of the primary_group_keys feature in the context of multi-user multi groups. I think the first question is to define the scope of canonical map annotations i.e.:

canonical map annotations is a group concept: in a read-annotate or read-write group, canonical map annotations owned by any user should be reused by other group members when linking
canonical map annotations is a user concept: independently of the group permissions, the canonical annotations should be restricted to the scope of the current user

pwalczysko commented 4 years ago

It might be useful to know how many Homo sapiens annotations of the given namespace exist in the system and who owns them when the error is thrown.

The status quo when the exception was thrown is ca 51 Homo sapiens annotations with the same namespace. These are owned by the 50 different users (user-x, x goes from 1 to 50) and 1 trainer. All the users are members of the RA group, all the data discussed here are in that RA group. The images annotated by that KVP are also owned by the respective users. I think I remember that the error is elicited if even 1 other user has the Homo sapiens anns and another user is trying to add it to his/her data. Again, I am talking here about the user/group setup as described in this paragraph (50 users, 1 of which has the homo sapiens already, another one is trying to add theirs homo sapiens).

pwalczysko commented 4 years ago

canonical map annotations is a group concept: in a read-annotate or read-write group, canonical map annotations owned by any user should be reused by other group members when linking

This would be a "special group" situation, such as publication groups, or rw groups where th edata are stricly cooperative. Hard to imagine that level of cooperation in a "normal" vanilla OMERO usage setup (thinking about the policing effort to wipe out duplication attempts and enable the finiding and inking of the existing annotation which belongs to a different user.

canonical map annotations is a user concept: independently of the group permissions, the canonical annotations should be restricted to the scope of the current user

This is then not canonical anymore ? The user would be only in canon with themselves, not with others, if I get it right.

In summary, I am not worried too much about this - the whole issue is just that the IDR workflows cannot be transferred to vanilla OMERO environment as tehy stand, but actually, the adjustment of the bulkmap to get the canonical annotations completely out of the way seems to me a good solution actually. The two options suggested there - they would have to be explained most carefully to the user - I do not have a feeling or examples from the field at the moment showing the users are needing them.

sbesson commented 4 years ago

It occurs to me that in the context of https://github.com/ome/omero-guide-upload/pull/6, if one of the goals is to consume an IDR bulkmap configuration file verbatim in a multi-user context, a short-term option would be to add a CLI option that would ignore the primary keys features e.g. omero metadata populate --context bulkmap --ignore-primary-keys

pwalczysko commented 4 years ago

It occurs to me that in the context of ome/omero-guide-upload#6, if one of the goals is to consume an IDR bulkmap configuration file verbatim in a multi-user context, a short-term option would be to add a CLI option that would ignore the primary keys features e.g. omero metadata populate --context bulkmap --ignore-primary-keys

Yes, thank you, I think this is a good idea.

ome / omero-metadata

default idr workflows fail in outreach situation #44