Closed mstfdkmn closed 2 months ago
We'll look into this soon.
Putting this in 2.1.0, but might pull it in to 2.0.1 if time allows.
Hi @mstfdkmn, I know it's been a little while; but if you've still got access to this test data or can re-create these conditions on the system in question, I'd be interested in:
/home/u0137480/move.py
or at very least the operative part where the error occurred.To answer your other question,I don't think there is an "expected" threshold for number, and I wouldn't expect total data size to matter either. It's just the coll_name
and parent_coll_name
being updated for each DB row describing a sub-collection. In fact with fewer than 37000 data objects and 2500 subcollections, I wouldn't have expected the server-side socket to time out either, so I'm definitely confused.
Please include the types of resources touched during the rename.
Please include the types of resources touched during the rename.
I am not sure I understood exactly what you meant here, but if you meant I need to specify a resource name, I guess no option/parameter available to pass the resource name to the move() method.
Hi, I reproduced this with the same data in our pilot zone (in the past this zone had the replication resource, so some of data objects used in this test have multiple replicas).
Seeing the script /home/u0137480/move.py or at very least the operative part where the error occurred.
root@33d6bc1286de:/# cat move.py
import os, os.path
from irods.session import iRODSSession
env_file = os.getenv('IRODS_ENVIRONMENT_FILE', os.path.expanduser('~/.irods/irods_environment.json'))
with iRODSSession(irods_env_file=env_file) as session:
session.collections.move("/kuleuven_tier1_pilot/home/datateam/testdata", "/kuleuven_tier1_pilot/home/datateam/testdata_rename")
Knowing iRODS server version.
>>> session.server_version
(4, 3, 1)
Knowing the total time imv took to (successfully) rename the collection. Took about 35 minutes.
Another input: I was able to rename quickly a collection that contains a single large file (2.5TB) in another zone that doenst have the replication resource.
Unfortunately I have not been able to transfer somehow the dataset I used in our pilot zone to another zone which doesnt have replication. After I transfer this data, I will be able to test with the same data in my test to exclude the potential effect of replication.
Please include the types of resources touched during the rename.
I am not sure I understood exactly what you meant here, but if you meant I need to specify a resource name, I guess no option/parameter available to pass the resource name to the move() method.
I see how that can be confusing. I'll try and explain what I was thinking ...
If you rename a data object which has replicas in a unixfilesystem resource, the default behavior is to update the catalog and the replicas in storage. The s3 resource may do something similar.
So, the behavior of the rename operation depends on the storage resources involved and the server's configuration. That's why I was curious to know what resource types may have been involved in the rename.
After re-reading the issue, I don't think it's important to know about the resources due to imv
succeeding.
Leaving it to @d-w-moore.
We'll look into this soon.
Putting this in 2.1.0, but might pull it in to 2.0.1 if time allows.
Since we have not been able to reproduce it, I'm thinking the milestone should be 2.1.1
We'll look into this soon. Putting this in 2.1.0, but might pull it in to 2.0.1 if time allows.
Since we have not been able to reproduce it, I'm thinking the milestone should be 2.1.1
Let's keep this here for now - it may be confirmed to be fixed / good by release time.
As 2.1.0 approaches... we can bump to 2.1.1 if we have no new information.
Let's keep this here for now - it may be confirmed to be fixed / good by release time.
As 2.1.0 approaches... we can bump to 2.1.1 if we have no new information.
Agreed ... it leaves room for more progress.
@mstfdkmn - though I wasn't able to reproduce, I'm pretty confident this issue has the same fix as #564. With a patch from pull request #570 (which guarantees connection_timeout
applies to all connections whether new or cached) I believe all you would need to do is modify your renaming script, in your reply I've quoted below, to do this:
session.connection_timeout = None
just prior to the session.collections.move( ... )
call.
If you'd like to try it on your end, I'd welcome any further input and/or news on results from this proposed experiment (especially since it's likely that I personally don't have access to any iRODS collections heavy enough to evoke the NetworkException on renaming) .
Hi, I reproduced this with the same data in our pilot zone (in the past this zone had the replication resource, so some of data objects used in this test have multiple replicas).
Seeing the script /home/u0137480/move.py or at very least the operative part where the error occurred.
root@33d6bc1286de:/# cat move.py import os, os.path from irods.session import iRODSSession env_file = os.getenv('IRODS_ENVIRONMENT_FILE', os.path.expanduser('~/.irods/irods_environment.json')) with iRODSSession(irods_env_file=env_file) as session: session.collections.move("/kuleuven_tier1_pilot/home/datateam/testdata", "/kuleuven_tier1_pilot/home/datateam/testdata_rename")
Knowing iRODS server version.
>>> session.server_version (4, 3, 1)
Knowing the total time imv took to (successfully) rename the collection. Took about 35 minutes.
Another input: I was able to rename quickly a collection that contains a single large file (2.5TB) in another zone that doenst have the replication resource.
Unfortunately I have not been able to transfer somehow the dataset I used in our pilot zone to another zone which doesnt have replication. After I transfer this data, I will be able to test with the same data in my test to exclude the potential effect of replication.
Dear @d-w-moore , indeed it worked, I could rename the same collection with the fix in your branch. Thanks.
@mstfdkmn Excellent ! Closing this issue.
Resolved by changes in https://github.com/irods/python-irodsclient/pull/570.
Hi, when I wanted to rename (not move - copy/remove) a heavy collection (large in size and so many nested sub items), the PRC failed whereas I could rename this by
imv
.Here is some information about the collection that I wanted to rename:
Here is what I got as client:
my first attempt:
my next attempts:
Here is the iRODS logs:
This is not a move (copy/remove from source to target) but a rename (change of name - change of records in data base). Like here https://github.com/irods/python-irodsclient/blob/main/irods/test/collection_test.py#L95
Is this something expected? If yes can we know thresholds? Does the size of a collection matters or the amount of sub items in it?
Btw, our irods catalog is Mysql. Thanks.