Open atarkowska opened 7 years ago
Issues to consider:
bulkmap-config.yml
are changed or deleted (but not added) should the annotations be modified/deleted/recreated?I am not sure at the moment about updates, this is much more complicated. Keep in mind that annotations are shared and if PK is updated then this will be completely new annotation.
One possible solution for update could be that if namespace is considered unique per Image/Well then script could proceed as normal, deleting/unlinking (see delete issues in https://github.com/openmicroscopy/design/issues/67) and creating new one
If the requirement is to only create an annotation for namespace 'xxxx' if an image/well doesn't already have an annotation in xxxx
regardless of whether it's up to date or not that should be easier to implement.
So going more to the details
o.org/omero/bulk_annotations
Modifying config as follow:
---
columns:
- name: Channels
include: yes
+ - name: Comment
+ include: yes
all of the existing openmicroscopy.org/omero/bulk_annotation
would need to be reviewed and updated with the respect of omitempty
flag. And the same if column is removed. That is simple usecase as there is always single AnnotationLink
The same use case but PK:
- group:
namespace: openmicroscopy.org/mapr/sirna
columns:
- name: siRNA Identifier
include: yes
+ omitempty: no
+ - name: siRNA Identifier
+ clientname: siRNA Pool Identifier
+ clientvalue: ""
+ include: yes
+ omitempty: no
There are 2 cases here:
Modifying config as follow:
- group:
namespace: openmicroscopy.org/mapr/gene
columns:
- name: Gene Identifier
include: yes
+ omitempty: no
- name: Gene Identifier
clientname: Gene Identifier URL
clientvalue: http://www.ensembl.org/id/{{ value|urlencode }}
include: yes
- name: Gene Symbol
include: yes
+ omitempty: no
I think this is simple as it would need to add whatever was missing.
Modifying config as follow:
---
columns:
- name: Channels
include: yes
- group:
namespace: openmicroscopy.org/mapr/sirna
columns:
- name: siRNA Identifier
include: yes
+ omitempty: no
+ split: ;
script would need to review and decide if map already exists delete or update? Perhaps depends if this is PK and Map is linked to multiple images or not
advanced:
ignore_missing_primary_key: yes
primary_group_keys:
- namespace: openmicroscopy.org/mapr/sirna
keys:
- siRNA Identifier
The same use case can be analyze in opposite way where split: ;
is removed from the config.
Updating cvs -> bulk_annotations, no changes to the config.
In general I think each map could be reviewed if PK is changing or not:
What happens if the primary key definition in one config file is changed? Ideally the primary keys should be defined in a global file but is there a short-term workaround? If we do implement global primary keys should it be possible to change the definition at all?
Existing implementation doesn't allow to define different PK for the same NS, so it looks like it should be global. It is very likely to get an exception https://github.com/openmicroscopy/openmicroscopy/blob/metadata52/components/tools/OmeroPy/src/omero/util/metadata_mapannotations.py#L218.
Would be sufficient to extend default config by adding primary_group_keys
and then inherit it in each individual study?
- default config - minimum predefiend by the plugin
- global config - any attributes that a common including columns,
- study config
We have to have a way of updating PKs otherwise fixing bugs as above won't be possible.
Two concrete examples of usecase 4 that I have:
1. updating some gene symbols In this case 5 genes, corresponding to 5 wells need their gene symbols updated. Changed in CSV but no bulkmap config change. PR is here https://github.com/IDR/idr-metadata/pull/198. Only 5 values out of about 22000 for gene symbol will change.
2. changing channel information annotations I'm about to go through all the channel information annotations to look at consistency in naming etc. Its possible that I might update a csv changing the 'Channel' information values for all wells.
I am happy with any of the following but
i. deleting all annotations for the whole screen and reloading (obviously this will take time in some screens)
ii. allowing deletion of a certain annotation type e.g. Channel info or Gene Identifier/Gene Symbol and reloading just that
iii. allowing deletion of a certain annotation type at a more precise level e.g. on certain wells for the case of Gene Symbol changing in a small % of wells.
I am happy with any of the following but i. deleting all annotations for the whole screen and reloading (obviously this will take time in some screens) ii. allowing deletion of a certain annotation type e.g. Channel info or Gene Identifier/Gene Symbol and reloading just that iii. allowing deletion of a certain annotation type at a more precise level e.g. on certain wells for the case of Gene Symbol changing in a small % of wells.
As we don't operate on diffs but entire bulk file I think entire screen has to be scanned everytime to detect what data has changed. Do we have any way to detect the diff?
Just giving more feedback. running populate metadata against already annotated data result in
Traceback (most recent call last):
File "/home/omero/workspace/OMERO-server/OMERO.server/bin/omero", line 125, in <module>
rv = omero.cli.argv()
File "/home/omero/workspace/OMERO-server/OMERO.server/lib/python/omero/cli.py", line 1489, in argv
cli.invoke(args[1:])
File "/home/omero/workspace/OMERO-server/OMERO.server/lib/python/omero/cli.py", line 974, in invoke
stop = self.onecmd(line, previous_args)
File "/home/omero/workspace/OMERO-server/OMERO.server/lib/python/omero/cli.py", line 1051, in onecmd
self.execute(line, previous_args)
File "/home/omero/workspace/OMERO-server/OMERO.server/lib/python/omero/cli.py", line 1133, in execute
args.func(args)
File "/home/omero/workspace/OMERO-server/OMERO.server/lib/python/omero/plugins/metadata.py", line 439, in populate
ctx.write_to_omero(batch_size=args.batch, loops=loops, ms=ms)
File "/home/omero/workspace/OMERO-server/OMERO.server/lib/python/omero/util/populate_metadata.py", line 1406, in write_to_omero
sz = self._save_annotation_and_links(batch, ma, batch_size)
File "/home/omero/workspace/OMERO-server/OMERO.server/lib/python/omero/util/populate_metadata.py", line 1287, in _save_annotation_and_links
batch, {'omero.group': group})
File "/home/omero/workspace/OMERO-server/OMERO.server/lib/python/omero_api_IUpdate_ice.py", line 138, in saveArray
return _M_omero.api.IUpdate._op_saveArray.invoke(self, ((graph, ), _ctx))
omero.ValidationException: exception ::omero::ValidationException
{
serverStackTrace = ome.conditions.ValidationException: could not insert: [ome.model.annotations.ImageAnnotationLink]; SQL [insert into imageannotationlink (child, creation_id, external_id, group_id, owner_id, permissions, update_id, parent, version, id) values (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)]; constraint [imageannotationlink_parent_child_owner_id_key]; nested exception is org.hibernate.exception.ConstraintViolationException: could not insert: [ome.model.annotations.ImageAnnotationLink]
at org.springframework.orm.hibernate3.SessionFactoryUtils.convertHibernateAccessException(SessionFactoryUtils.java:637)
at org.springframework.orm.hibernate3.HibernateAccessor.convertHibernateAccessException(HibernateAccessor.java:412)
at org.springframework.orm.hibernate3.HibernateInterceptor.invoke(HibernateInterceptor.java:117)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:108)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
at ome.tools.hibernate.ProxyCleanupFilter$Interceptor.invoke(ProxyCleanupFilter.java:249)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
at ome.services.util.ServiceHandler.invoke(ServiceHandler.java:121)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:202)
at com.sun.proxy.$Proxy91.saveArray(Unknown Source)
at sun.reflect.GeneratedMethodAccessor1084.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:307)
at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:183)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150)
at ome.security.basic.BasicSecurityWiring.invoke(BasicSecurityWiring.java:93)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
at ome.services.blitz.fire.AopContextInitializer.invoke(AopContextInitializer.java:43)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:202)
at com.sun.proxy.$Proxy91.saveArray(Unknown Source)
at sun.reflect.GeneratedMethodAccessor1086.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at ome.services.blitz.util.IceMethodInvoker.invoke(IceMethodInvoker.java:172)
at ome.services.throttling.Callback.run(Callback.java:56)
at ome.services.throttling.InThreadThrottlingStrategy.callInvokerOnRawArgs(InThreadThrottlingStrategy.java:56)
at ome.services.blitz.impl.AbstractAmdServant.callInvokerOnRawArgs(AbstractAmdServant.java:140)
at ome.services.blitz.impl.UpdateI.saveArray_async(UpdateI.java:68)
at sun.reflect.GeneratedMethodAccessor1085.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:307)
at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:183)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150)
at omero.cmd.CallContext.invoke(CallContext.java:78)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:202)
at com.sun.proxy.$Proxy92.saveArray_async(Unknown Source)
at omero.api._IUpdateTie.saveArray_async(_IUpdateTie.java:98)
at omero.api._IUpdateDisp.___saveArray(_IUpdateDisp.java:212)
at omero.api._IUpdateDisp.__dispatch(_IUpdateDisp.java:387)
at IceInternal.Incoming.invoke(Incoming.java:221)
at Ice.ConnectionI.invokeAll(ConnectionI.java:2536)
at Ice.ConnectionI.dispatch(ConnectionI.java:1145)
at Ice.ConnectionI.message(ConnectionI.java:1056)
at IceInternal.ThreadPool.run(ThreadPool.java:395)
at IceInternal.ThreadPool.access$300(ThreadPool.java:12)
at IceInternal.ThreadPool$EventHandlerThread.run(ThreadPool.java:832)
at java.lang.Thread.run(Thread.java:745)
serverExceptionClass = ome.conditions.ValidationException
message = could not insert: [ome.model.annotations.ImageAnnotationLink]; SQL [insert into imageannotationlink (child, creation_id, external_id, group_id, owner_id, permissions, update_id, parent, version, id) values (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)]; constraint [imageannotationlink_parent_child_owner_id_key]; nested exception is org.hibernate.exception.ConstraintViolationException: could not insert: [ome.model.annotations.ImageAnnotationLink]
}
Populate_metadata plugin allows populate map annotations based on a bulk file with the given context:
Second step could be idempotent, rerunning the same command
bin/omero metadata populate --context bulkmap --cfg /path/to/bulkmap-config.yml Screen:ID
shouldn't thrown any exceptions.cc @manics @joshmoore @eleanorwilliams