hpcleuven / vsc-python-irodsclient

VSC Python iRODS client
GNU Lesser General Public License v3.0
1 stars 2 forks source link

vsc-prc-size behaves strange #2

Open ingridbr opened 4 years ago

ingridbr commented 4 years ago

When executing a vsc-prc-size command in a directory I get an error and the size of a file that does not exist in this directory. The directory I am using:

/kuleuven_tier1_pilot/home/vsc30706/testcol: 10G.dat 10M.dat 1G.dat 1M.dat C- /kuleuven_tier1_pilot/home/vsc30706/testcol/500GB-in-large-files C- /kuleuven_tier1_pilot/home/vsc30706/testcol/50GB-in-medium-files C- /kuleuven_tier1_pilot/home/vsc30706/testcol/5GB-in-small-files C- /kuleuven_tier1_pilot/home/vsc30706/testcol/5MB-in-tiny-files C- /kuleuven_tier1_pilot/home/vsc30706/testcol/Climate-Huge C- /kuleuven_tier1_pilot/home/vsc30706/testcol/Climate-Large C- /kuleuven_tier1_pilot/home/vsc30706/testcol/Climate-Medium C- /kuleuven_tier1_pilot/home/vsc30706/testcol/Climate-Small

Whe I execute vsc-prc-size -H -r "./*.dat"

I get as result:

93G ./100G.dat Traceback (most recent call last): File "/apps/leuven/common/software/vsc-python-irodsclient/development/vsc-python-irodsclient/tools/vsc-prc-size", line 52, in for path, size in iterator: File "/apps/leuven/common/software/vsc-python-irodsclient/development/vsc-python-irodsclient/lib/vsc_irods/manager/bulk_manager.py", line 430, in size assert len(results) == 1 AssertionError

The verification that I am in the right directory: ipwd /kuleuven_tier1_pilot/home/vsc30706/testcol

The file 100G.dat does not exist in this directory (but it does exist in my home):

ils /kuleuven_tier1_pilot/home/vsc30706: 100G.dat 100M.dat 10G.dat 10M.dat 1G.dat 1M.dat 50M.dat CTBY_a_n.nc

MaximeVdB commented 4 years ago

Hi Ingrid -- as we discussed earlier, you're seeing the size of a file in your irods_home because VSC-PRC does not yet inherit the current working directory from your 'regular' iRODS session. Starting out with the irods_home as the irods_cwd is fine when users are using VSC-PRC in their Python scripts. But when using the command-line tools, syncing with the irods_cwd of their shell session is indeed better (more powerful and probably expected by most users). This will be addressed soon.

The AssertionError occuring at the data object (100M.dat) is not related to this. What's happening here is that (in contrast to 100G.dat), this file has been replicated (two replicas in total), yielding two hits in a query where only one is expected. Such scenarios haven't yet been considered yet in VSC-PRC, and I'll work on that as well.

ingridbr commented 4 years ago

Ok, thanks Maxime. The replica issue has certainly to be fixed as in our irods system we create 2 replica's of every file by default :-)

MaximeVdB commented 4 years ago

After having a closer look, I noticed the bulk.size() operation did work correctly with multiple replicas. The origin of your error lies in the fact that the 2 replicas of e.g. your "~/100M.dat" object have different sizes (0 and 1000000). I suppose this is a sign that this object has been corrupted in some way? For now, I've just added a more descriptive error message than the plain AssertionError.