ome / design

OME Design proposals
http://ome.github.io/design/
1 stars 15 forks source link

Populate metadata - idempotent #66

Open atarkowska opened 7 years ago

atarkowska commented 7 years ago

Populate_metadata plugin allows populate map annotations based on a bulk file with the given context:

# step 1. create bulk file
bin/omero metadata populate --file /path/to/annotation.csv Screen:ID
# step 2. populate maps
bin/omero metadata populate --context bulkmap --cfg /path/to/bulkmap-config.yml Screen:ID

Second step could be idempotent, rerunning the same command bin/omero metadata populate --context bulkmap --cfg /path/to/bulkmap-config.yml Screen:ID shouldn't thrown any exceptions.

cc @manics @joshmoore @eleanorwilliams

manics commented 7 years ago

Issues to consider:

atarkowska commented 7 years ago

I am not sure at the moment about updates, this is much more complicated. Keep in mind that annotations are shared and if PK is updated then this will be completely new annotation.

atarkowska commented 7 years ago

One possible solution for update could be that if namespace is considered unique per Image/Well then script could proceed as normal, deleting/unlinking (see delete issues in https://github.com/openmicroscopy/design/issues/67) and creating new one

manics commented 7 years ago

If the requirement is to only create an annotation for namespace 'xxxx' if an image/well doesn't already have an annotation in xxxx regardless of whether it's up to date or not that should be easier to implement.

atarkowska commented 7 years ago

So going more to the details

Use case 1: general o.org/omero/bulk_annotations

Modifying config as follow:

---
columns:
  - name: Channels
     include: yes
+ - name: Comment
+    include: yes

all of the existing openmicroscopy.org/omero/bulk_annotation would need to be reviewed and updated with the respect of omitempty flag. And the same if column is removed. That is simple usecase as there is always single AnnotationLink

The same use case but PK:

   - group:
       namespace: openmicroscopy.org/mapr/sirna
       columns:
       - name: siRNA Identifier
         include: yes
+        omitempty: no
+      - name: siRNA Identifier
+        clientname: siRNA Pool Identifier
+        clientvalue: ""
+        include: yes
+        omitempty: no

There are 2 cases here:

Use case 2:

Modifying config as follow:

   - group:
       namespace: openmicroscopy.org/mapr/gene
       columns:
       - name: Gene Identifier
         include: yes
+        omitempty: no
       - name: Gene Identifier
         clientname: Gene Identifier URL
         clientvalue: http://www.ensembl.org/id/{{ value|urlencode }}
         include: yes
       - name: Gene Symbol
         include: yes
+        omitempty: no

I think this is simple as it would need to add whatever was missing.

Use case 3:

Modifying config as follow:

---
columns:
  - name: Channels
     include: yes
  - group:
      namespace: openmicroscopy.org/mapr/sirna
      columns:
      - name: siRNA Identifier
        include: yes
+       omitempty: no
+       split: ;

script would need to review and decide if map already exists delete or update? Perhaps depends if this is PK and Map is linked to multiple images or not

 advanced:
     ignore_missing_primary_key: yes
     primary_group_keys:
     - namespace: openmicroscopy.org/mapr/sirna
       keys:
       - siRNA Identifier

The same use case can be analyze in opposite way where split: ; is removed from the config.

Use case 4:

Updating cvs -> bulk_annotations, no changes to the config.

In general I think each map could be reviewed if PK is changing or not:

atarkowska commented 7 years ago

What happens if the primary key definition in one config file is changed? Ideally the primary keys should be defined in a global file but is there a short-term workaround? If we do implement global primary keys should it be possible to change the definition at all?

Existing implementation doesn't allow to define different PK for the same NS, so it looks like it should be global. It is very likely to get an exception https://github.com/openmicroscopy/openmicroscopy/blob/metadata52/components/tools/OmeroPy/src/omero/util/metadata_mapannotations.py#L218.

Would be sufficient to extend default config by adding primary_group_keys and then inherit it in each individual study?


- default config - minimum predefiend by the plugin
  - global config - any attributes that a common including columns, 
    - study config

We have to have a way of updating PKs otherwise fixing bugs as above won't be possible. 
eleanorwilliams commented 7 years ago

Two concrete examples of usecase 4 that I have:

1. updating some gene symbols In this case 5 genes, corresponding to 5 wells need their gene symbols updated. Changed in CSV but no bulkmap config change. PR is here https://github.com/IDR/idr-metadata/pull/198. Only 5 values out of about 22000 for gene symbol will change.

2. changing channel information annotations I'm about to go through all the channel information annotations to look at consistency in naming etc. Its possible that I might update a csv changing the 'Channel' information values for all wells.

I am happy with any of the following but

i. deleting all annotations for the whole screen and reloading (obviously this will take time in some screens)

ii. allowing deletion of a certain annotation type e.g. Channel info or Gene Identifier/Gene Symbol and reloading just that

iii. allowing deletion of a certain annotation type at a more precise level e.g. on certain wells for the case of Gene Symbol changing in a small % of wells.

atarkowska commented 7 years ago

I am happy with any of the following but i. deleting all annotations for the whole screen and reloading (obviously this will take time in some screens) ii. allowing deletion of a certain annotation type e.g. Channel info or Gene Identifier/Gene Symbol and reloading just that iii. allowing deletion of a certain annotation type at a more precise level e.g. on certain wells for the case of Gene Symbol changing in a small % of wells.

As we don't operate on diffs but entire bulk file I think entire screen has to be scanned everytime to detect what data has changed. Do we have any way to detect the diff?

atarkowska commented 7 years ago

Just giving more feedback. running populate metadata against already annotated data result in

Traceback (most recent call last):
  File "/home/omero/workspace/OMERO-server/OMERO.server/bin/omero", line 125, in <module>
    rv = omero.cli.argv()
  File "/home/omero/workspace/OMERO-server/OMERO.server/lib/python/omero/cli.py", line 1489, in argv
    cli.invoke(args[1:])
  File "/home/omero/workspace/OMERO-server/OMERO.server/lib/python/omero/cli.py", line 974, in invoke
    stop = self.onecmd(line, previous_args)
  File "/home/omero/workspace/OMERO-server/OMERO.server/lib/python/omero/cli.py", line 1051, in onecmd
    self.execute(line, previous_args)
  File "/home/omero/workspace/OMERO-server/OMERO.server/lib/python/omero/cli.py", line 1133, in execute
    args.func(args)
  File "/home/omero/workspace/OMERO-server/OMERO.server/lib/python/omero/plugins/metadata.py", line 439, in populate
    ctx.write_to_omero(batch_size=args.batch, loops=loops, ms=ms)
  File "/home/omero/workspace/OMERO-server/OMERO.server/lib/python/omero/util/populate_metadata.py", line 1406, in write_to_omero
    sz = self._save_annotation_and_links(batch, ma, batch_size)
  File "/home/omero/workspace/OMERO-server/OMERO.server/lib/python/omero/util/populate_metadata.py", line 1287, in _save_annotation_and_links
    batch, {'omero.group': group})
  File "/home/omero/workspace/OMERO-server/OMERO.server/lib/python/omero_api_IUpdate_ice.py", line 138, in saveArray
    return _M_omero.api.IUpdate._op_saveArray.invoke(self, ((graph, ), _ctx))
omero.ValidationException: exception ::omero::ValidationException
{
    serverStackTrace = ome.conditions.ValidationException: could not insert: [ome.model.annotations.ImageAnnotationLink]; SQL [insert into imageannotationlink (child, creation_id, external_id, group_id, owner_id, permissions, update_id, parent, version, id) values (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)]; constraint [imageannotationlink_parent_child_owner_id_key]; nested exception is org.hibernate.exception.ConstraintViolationException: could not insert: [ome.model.annotations.ImageAnnotationLink]
    at org.springframework.orm.hibernate3.SessionFactoryUtils.convertHibernateAccessException(SessionFactoryUtils.java:637)
    at org.springframework.orm.hibernate3.HibernateAccessor.convertHibernateAccessException(HibernateAccessor.java:412)
    at org.springframework.orm.hibernate3.HibernateInterceptor.invoke(HibernateInterceptor.java:117)
    at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
    at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:108)
    at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
    at ome.tools.hibernate.ProxyCleanupFilter$Interceptor.invoke(ProxyCleanupFilter.java:249)
    at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
    at ome.services.util.ServiceHandler.invoke(ServiceHandler.java:121)
    at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
    at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:202)
    at com.sun.proxy.$Proxy91.saveArray(Unknown Source)
    at sun.reflect.GeneratedMethodAccessor1084.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:307)
    at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:183)
    at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150)
    at ome.security.basic.BasicSecurityWiring.invoke(BasicSecurityWiring.java:93)
    at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
    at ome.services.blitz.fire.AopContextInitializer.invoke(AopContextInitializer.java:43)
    at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
    at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:202)
    at com.sun.proxy.$Proxy91.saveArray(Unknown Source)
    at sun.reflect.GeneratedMethodAccessor1086.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at ome.services.blitz.util.IceMethodInvoker.invoke(IceMethodInvoker.java:172)
    at ome.services.throttling.Callback.run(Callback.java:56)
    at ome.services.throttling.InThreadThrottlingStrategy.callInvokerOnRawArgs(InThreadThrottlingStrategy.java:56)
    at ome.services.blitz.impl.AbstractAmdServant.callInvokerOnRawArgs(AbstractAmdServant.java:140)
    at ome.services.blitz.impl.UpdateI.saveArray_async(UpdateI.java:68)
    at sun.reflect.GeneratedMethodAccessor1085.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:307)
    at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:183)
    at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150)
    at omero.cmd.CallContext.invoke(CallContext.java:78)
    at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
    at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:202)
    at com.sun.proxy.$Proxy92.saveArray_async(Unknown Source)
    at omero.api._IUpdateTie.saveArray_async(_IUpdateTie.java:98)
    at omero.api._IUpdateDisp.___saveArray(_IUpdateDisp.java:212)
    at omero.api._IUpdateDisp.__dispatch(_IUpdateDisp.java:387)
    at IceInternal.Incoming.invoke(Incoming.java:221)
    at Ice.ConnectionI.invokeAll(ConnectionI.java:2536)
    at Ice.ConnectionI.dispatch(ConnectionI.java:1145)
    at Ice.ConnectionI.message(ConnectionI.java:1056)
    at IceInternal.ThreadPool.run(ThreadPool.java:395)
    at IceInternal.ThreadPool.access$300(ThreadPool.java:12)
    at IceInternal.ThreadPool$EventHandlerThread.run(ThreadPool.java:832)
    at java.lang.Thread.run(Thread.java:745)

    serverExceptionClass = ome.conditions.ValidationException
    message = could not insert: [ome.model.annotations.ImageAnnotationLink]; SQL [insert into imageannotationlink (child, creation_id, external_id, group_id, owner_id, permissions, update_id, parent, version, id) values (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)]; constraint [imageannotationlink_parent_child_owner_id_key]; nested exception is org.hibernate.exception.ConstraintViolationException: could not insert: [ome.model.annotations.ImageAnnotationLink]
}