Closed irinaColgiu closed 8 years ago
Either a single IrodsMetadata
collection is supplied for all paths, or a separate collection is supplied for each path, i.e.:
mapper.add(["/path_1", "/path_2"], metadata_for_path_1_and_2)
# Or
mapper.add(["/path_1", "/path_2"], [metadata_for_path_1, metadata_for_path_2])
If the mapper is used correctly, the len(paths) == len(metadata_for_paths)
assertion always passes. However, to help in case where it's not, e.g.
# Wrong!
mapper.add(["/path_1", "/path_2", "/path_3"], [metadata_for_path_1, metadata_for_path_2])
I'll add a check so that a more informative exception is raised than the IndexError
that you pointed out.
I don't understand, the 2 seem to me unrelated - why would the number of paths I want to modify depend on the length of metadata - ie number of avus I want to add or remove?! Most of the times I will want to add a set of 10-20-30 AVUs let's say, to 1 data_object, or remove a couple of AVUs from 1 data_object.
For me this part throws an exception both when I want to add and when I want to remove metadata:
In [16]: irods.data_object.metadata.add("/humgen/projects/serapis_staging/test-baton/test_metadata_add_rm.txt", [IrodsMetadata({'key1': {'val1'}}), IrodsMetadata({'key2': {'val2'}})])
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-16-11ee5a0305c5> in <module>()
----> 1 irods.data_object.metadata.add("/humgen/projects/serapis_staging/test-baton/test_metadata_add_rm.txt", [IrodsMetadata({'key1': {'val1'}}), IrodsMetadata({'key2': {'val2'}})])
/nfs/users/nfs_i/ic4/Projects/python3/serapis/ENV/lib/python3.5/site-packages/baton/_baton/baton_metadata_mappers.py in add(self, paths, metadata)
56
57 def add(self, paths: Union[str, Iterable[str]], metadata: IrodsMetadata):
---> 58 self._modify(paths, metadata, BATON_METAMOD_OPERATION_ADD)
59
60 def set(self, paths: Union[str, Iterable[str]], metadata: IrodsMetadata):
/nfs/users/nfs_i/ic4/Projects/python3/serapis/ENV/lib/python3.5/site-packages/baton/_baton/baton_metadata_mappers.py in _modify(self, paths, metadata_for_paths, operation)
102
103
--> 104 class BatonDataObjectIrodsMetadataMapper(_BatonIrodsMetadataMapper):
105 """
106 iRODS data object metadata mapper, implemented using baton.
IndexError: list index out of range
Remember that IrodsMetadata
is a collection of key-(set)value pairs. Instead of the invalid:
irods.data_object.metadata.add("/path", [IrodsMetadata({'key1': {'val1'}}), IrodsMetadata({'key2': {'val2'}})])
you'll need:
irods.data_object.metadata.add("/path", IrodsMetadata({'key1': {'val1'}, 'key2': {'val2'}}))
I don't understand, the 2 seem to me unrelated - why would the number of paths I want to modify depend on the length of metadata - ie number of avus I want to add or remove?!
It is possible to specifiy that a single collection of metadata is to be used with each path, e.g. with the example below, metadata
is set for all paths:
mapper.set(["/path_1", "/path_2"], metadata)
The mapper also allows different metadata to be set for each path, e.g. in this example, metadata_1
is set for /path_1
and metadata_2
is set for /path_2
:
mapper.set(["/path_1", "/path_2"], [metadata_1, metadata_2])
I have made some corrections/improvements to the documentation and interface to make this clearer. I have also added the aforementioned check for an invalid set of arguments so as to raise a more informative exception.
Most of the times I will want to add a set of 10-20-30 AVUs let's say, to 1 data_object, or remove a couple of AVUs from 1 data_object.
Most of the time then you won't be using the more elaborate "multi-path, multi-metadata" functionality. Instead, you'll just need:
mapper.set("/path/data_object", metadata)
OK, I see... That wasn't clear to me from the docs...Well in this case calling it len(metadata_for_paths) in the context of:
python len(paths) == len(metadata_for_paths)
was very confusing, since metadata is a dict, the length of a dict is the number of elements in it, which is the nr of attribute names (keys in this case).
As an observation, I don't think this:
mapper.set(["/path_1", "/path_2"], metadata)
is going to be used, as the metadata for 2 files will always be different. I can't think of any use case in which there would be identical metadata for 2 data objects/collections.
I think in the case of set method there is too much functionality crowded on the same method, and it makes it confusing as to what it does, in the sense that it's not obvious if/how does the behavior of the set method change depending on the types of the params, I mean if I were to list all of the functionality of 1 method:
which a) looks to me like overloading a method in Java, (which is not necessarily bad, just smells like Java more than Python) and b) to me it would be more clear if you just call them differently, because the mental effort to understand how does the functionality differ in each case is bigger the more cases you have, and can be avoided by giving each a relevant name - e.g. - set(list1, list2) could be called set_all, and set(list1, object) -> set_to_each, or smth like that. IMHO.
As an observation, I don't think this:
mapper.set(["/path_1", "/path_2"], metadata)
is going to be used, as the metadata for 2 files will always be different. I can't think of any use case in which there would be identical metadata for 2 data objects/collections.
set
could be used to clear the metadata to an inital shared set of shared key/values, e.g. study_id
, upon unique key/values may be written for each entity? Probably not as useful as the equivalent add
method though, which could add shared key/values on top of existing keys.
looks to me like overloading a method in Java...
I guess the more Pythonic way is to overload methods by use of different parameters, e.g.:
def add(self, path: str=None, paths: Iterable[str]=None, metadata: IrodsMetadata=None, metadata_list:List[IrodsMetadata]=None)
# Instead of
def add(self, paths: Union[str, Iterable[str]], metadata: Union[IrodsMetadata, List[IrodsMetadata]])
It can certainly make the parameters themselves clearer. However, it comes at the cost of making the input to the function more opaque - the former method signiture offers no hint to the minimal set of arguments that must be supplied or of what combinations are valid.
The way this currently is makes it consistent with the rest of the interface. I do agree: it is indeed consistently more Javary than Pythonic!
to me it would be more clear if you just call them differently, because the mental effort to understand how does the functionality differ in each case is bigger the more cases you have, and can be avoided by giving each a relevant name
I'm not sure such a change would make things too much better: the mental effort is shifted from knowing about the purpose of the parameters to knowing about the purpose of a collection of related functions.
I am pretty sure this is a bug, though i haven't investigated it very thoroughly. So you iterate with i which takes values in the range of (0, nrof AVUs (called metadata_for_paths in the code)), and then you attempt to access the i-th element of the paths.
This is obviously only going to work when you test with nr_of_avus = nr_of_paths, otherwise if there is 1 path and 3 AVUs one gets a IndexError:
from: https://github.com/wtsi-hgi/python-baton-wrapper/blob/master/baton/_baton/baton_metadata_mappers.py#L87-L88