wtsi-npg / baton

iRODS client programs and API
http://wtsi-npg.github.io/baton
GNU General Public License v2.0
19 stars 20 forks source link

baton-metaset || baton-metamod --operation set #160

Closed mcast closed 4 weeks ago

mcast commented 8 years ago

I was looking for something between the two existing baton-meta* commands, more like imeta set -d...

It's just a suggestion which I hope is useful for the future. I'm expecting to go ahead with imeta set -d and take the multiple connect overheads in parallel across files.

Problems with baton-metamod --operation add

(i-cgp)mca@cgpbar:~$ jq -n '{collection: "/cgp/home/mca", data_object: "export.csv", avus: [ {a:"smippy", v:"twozzle" } ]}' | /software/npg/bin/baton-metamod --operation add --verbose 
{"avus": [{"v": "twozzle", "a": "smippy"}], "data_object": "export.csv", "collection": "/cgp/home/mca"}
(i-cgp)mca@cgpbar:~$ jq -n '{collection: "/cgp/home/mca", data_object: "export.csv", avus: [ {a:"smippy", v:"twozzle" } ]}' | /software/npg/bin/baton-metamod --operation add --verbose 
2016-05-03T13:33:00 ERROR Failed to add metadata 'smippy' -> 'twozzle' on '/cgp/home/mca/export.csv': error -809000 CATALOG_ALREADY_HAS_ITEM_BY_THAT_NAME.
{"avus": [{"v": "twozzle", "a": "smippy"}], "data_object": "export.csv", "collection": "/cgp/home/mca", "error": {"message": "Failed to add metadata 'smippy' -> 'twozzle' on '/cgp/home/mca/export.csv': error -809000 CATALOG_ALREADY_HAS_ITEM_BY_THAT_NAME", "code": -809000}}
(i-cgp)5@cgpbar:~$ 

Problem with baton-metasuper

Continuing from above,

(i-cgp)5@cgpbar:~$ jq -n '{collection: "/cgp/home/mca", data_object: "export.csv", avus: [ {a:"spram", v:"fubil" },{a:"ele",v:"phant"} ]}' | /software/npg/bin/baton-metasuper --verbose
{"avus": [{"v": "fubil", "a": "spram"}, {"v": "phant", "a": "ele"}], "collection": "/cgp/home/mca", "data_object": "export.csv"}
(i-cgp)mca@cgpbar:~$ imeta ls -d export.csv
AVUs defined for dataObj export.csv:
attribute: ele
value: phant
units: 
----
attribute: spram
value: fubil
units: 
(i-cgp)mca@cgpbar:~$ 

The smippy AVU has been lost.

Suggested enhancement

Continuing from above,

$ jq -n '{collection: "/cgp/home/mca", data_object: "export.csv", avus: [ {a:"flumm", v:"P"},{a:"ele",v:"phant"} ]}' \
 | /software/npg/bin/baton-metamod --operation set
$ jq -n '{collection: "/cgp/home/mca", data_object: "export.csv", avus: [ {a:"spram", v:null},{a:"ele",v:"baroo"} ]}' \
 | /software/npg/bin/baton-metamod --operation set

Actions

  1. set new flumm: P and no-op for ele: phant.
  2. clear spram: fubil (and any other spram keys, if there were more than one) and replace ele: phant with ele: baroo.

I'm assuming that iRODS itself can't store a null value, so we're not losing anything by extending the meaning of the null value to mean deletion.

Rationale

The reason I would like to supersede only specified values is that I have namespaced my keys.

I would like to be able to set (add, no-op or replace) the keys with names beginning test-mca-foo. without touching any keys with names beginning test-mca-bar.

The lazier and currently more likely reason is that I want to set test-mca-foo.id_ifile and test-mca-foo.id_analysis_proc in one bite, and then set test-mca-foo.ftrk_microstat independently without having to re-calculate the intended values of the other keys in the test-mca-foo. group.

keithj commented 8 years ago

On the command line, the way I would achieve what you're describing is to preprocess the JSON document being sent to baton-metasuper to remove the properties that you don't want to modify, e.g. by passing it through jq.

For most of our complex metadata editing we use an API built on the icommands and baton. If you're using Perl, then perhaps it might be worth considering. It is rather connection-hungry, so may not be suitable if that's a concern.

colin-nolan commented 8 years ago

I have also encountered the same issue as @mcast.

To do a set (as described by @mcast), it is necessary to do 3 operations:

  1. Get the values associated to the keys that are being set.
  2. Remove these (if they exist).
  3. Then set the new values.
mcast commented 8 years ago

@colin-nolan not forgetting step 4. hope very much that no other proess was doing the same thing with different changes!

We're using iRODS to implement a big bucket of files and decided to take the "patch over uncertainty and inconvenience by adding another layer" approach. So far I avoided the problem by

My only use of _set_AVUs so far is to add one AVU, so probably no gain unless I need to call it repeatedly on multiple files (quite possible) and I fed Baton a stream of jsonl/ndjson.

keithj commented 8 years ago

Adding a set operation is a very worthwhile enhancement and I'll add it to the next milestone. That will have the same semantics as imeta set, which fits what you're looking for.

baton is by design a thin layer over iRODS which allows you to talk JSON as a stream, while preserving standard iRODS behaviours. Multi-valued attributes and non-idempotence of the add operation are intrinsic to iRODS and therefore I will not change these.

With respect to

'4. hope very much that no other process was doing the same thing

That sentiment applies to anything done with iRODS metadata as currently implemented (certainly up to 4.1.9) because the ICAT behaves as if both AUTOCOMMIT is enabled and no transactions used. There is no way to prevent other clients getting caught in race conditions with your operations.

kjsanger commented 4 weeks ago

Closed as inactive.