irods / python-irodsclient

A Python API for iRODS
Other
62 stars 73 forks source link

replica.resource_name only displays root of resource tree on compound resource heirachy #127

Open kript opened 6 years ago

kript commented 6 years ago

example script;

from __future__ import print_function
import os
from irods.session import iRODSSession
try:
    env_file = os.environ['IRODS_ENVIRONMENT_FILE']
except KeyError:
    env_file = os.path.expanduser('~/.irods/irods_environment.json')

session = iRODSSession(irods_env_file=env_file)

obj = session.data_objects.get("/seq/illumina/library_merge/14200212.HXV2.paired308.19071ecea2/14200212.HXV2.paired308.19071ecea2.cram")

print("%s/%s" % (obj.collection, obj.name))

for replica in obj.replicas:
    print("repl no: %s" % (replica.number))
    print("resc: %s" % (replica.resource_name))
    print("repl path: %s" % (replica.path))
    print("repl status: %s" % (replica.status))
    print("repl checksum: %s" % (replica.checksum))

sample output;

./test_python.py
<iRODSCollection 151924721 14200212.HXV2.paired308.19071ecea2>/14200212.HXV2.paired308.19071ecea2.cram
repl no: 0
resc: root
repl path: /irods-seq-i24-bc/illumina/library_merge/14200212.HXV2.paired308.19071ecea2/14200212.HXV2.paired308.19071ecea2.cram
repl status: 1
repl checksum: 77271d5bc49617ac1c7dbf0b5c832a2b
repl no: 1
resc: root
repl path: /irods-seq-sr02-ddn-rd10-30-31-32/illumina/library_merge/14200212.HXV2.paired308.19071ecea2/14200212.HXV2.paired308.19071ecea2.cram
repl status: 1
repl checksum: None
repl no: 2
resc: root
repl path: /irods-seq-sr02-ddn-rd10-36-37-38/illumina/library_merge/14200212.HXV2.paired308.19071ecea2/14200212.HXV2.paired308.19071ecea2.cram
repl status: 1
repl checksum: None
repl no: 3
resc: root
repl path: /irods-seq-sr08-de/illumina/library_merge/14200212.HXV2.paired308.19071ecea2/14200212.HXV2.paired308.19071ecea2.cram
repl status: 1
repl checksum: 77271d5bc49617ac1c7dbf0b5c832a2b
[<iRODSMeta 151934015 md5 77271d5bc49617ac1c7dbf0b5c832a2b None>]

equivalent ils output;

ils -L seq/illumina/library_merge/14200212.HXV2.paired308.19071ecea2/14200212.HXV2.paired308.19071ecea2.cram
ERROR: lsUtil: srcPath /seq/home/irods-g1/seq/illumina/library_merge/14200212.HXV2.paired308.19071ecea2/14200212.HXV2.paired308.19071ecea2.cram does not exist or user lacks access permission
(virtualenv_prc_0.8.0)irods-g1@irods-g1:~/sanger-scripts$ ils -L /seq/illumina/library_merge/14200212.HXV2.paired308.19071ecea2/14200212.HXV2.paired308.19071ecea2.cram
  srpipe            0 root;replicate;seq-red;red4;irods-seq-i24-bc  18396941981 2018-05-01.12:10 & 14200212.HXV2.paired308.19071ecea2.cram
    77271d5bc49617ac1c7dbf0b5c832a2b    generic    /irods-seq-i24-bc/illumina/library_merge/14200212.HXV2.paired308.19071ecea2/14200212.HXV2.paired308.19071ecea2.cram
  srpipe            1 root;replicate;seq-green;green2;irods-seq-sr02-ddn-rd10-30-31-32            0 2018-05-01.12:33 & 14200212.HXV2.paired308.19071ecea2.cram
        generic    /irods-seq-sr02-ddn-rd10-30-31-32/illumina/library_merge/14200212.HXV2.paired308.19071ecea2/14200212.HXV2.paired308.19071ecea2.cram
  srpipe            2 root;replicate;seq-green;green2;irods-seq-sr02-ddn-rd10-36-37-38  18396941981 2018-05-01.14:35 & 14200212.HXV2.paired308.19071ecea2.cram
        generic    /irods-seq-sr02-ddn-rd10-36-37-38/illumina/library_merge/14200212.HXV2.paired308.19071ecea2/14200212.HXV2.paired308.19071ecea2.cram
  srpipe            3 root;replicate;seq-green;green5;irods-seq-sr08-de  18396941981 2018-05-01.14:38 & 14200212.HXV2.paired308.19071ecea2.cram
    77271d5bc49617ac1c7dbf0b5c832a2b    generic    /irods-seq-sr08-de/illumina/library_merge/14200212.HXV2.paired308.19071ecea2/14200212.HXV2.paired308.19071ecea2.cram
alanking commented 6 years ago

Are you running 4.1.x here? This seems to be the issue in pre-4.2 versions where the resource associated with a data object is the root resource in the hierarchy (causes irods/irods#3419, irods/irods#3503, etc.).

The get operation here runs a query to fetch all information in the catalog for the specified data object. The root resource of the hierarchy is stored in the resource_name column in 4.1.

If you print out the resc_hier instead, you can see the resource which actually stores the data object (and could use some python magic to just show the leaf resc if you want).

~Note that neither resc_hier nor resource_name are stored in the catalog in 4.2 (sort of), so this is a 4.1-only problem/solution.~ While this is technically true, you can still query for such information in 4.2. Sorry about that.

kript commented 6 years ago

Thanks @jaking92 - yes we're running 4.1.10 live, and 4.1.11 in dev. I can't see resc_heir defined in the iRODS DataObject, so I'm not sure how to expose that; could you give an example?

alanking commented 6 years ago

Hmm... fair point. It would seem that resc_hier is not part of iRODSReplica either. The only way I can think to do this presently would be via some sort of query with DataObject.resc_hier and the information you do have from the replicas in your example code above.

I hesitate to add a resc_hier attribute to iRODSReplica at this point since resource_name is functional on 4.2+, but to your point, ils has this info (and more). I'm going to hand off to @trel or @d-w-moore here, as they will have more-informed opinions than I do.