Open chStaiger opened 2 months ago
How did you upload the data for the iCommands example? I'm assuming you used iput
, but it would be helpful to know which iCommand and options were used.
I see that REG_CHKSUM_KW
is being used in the PRC put
. I believe that this is equivalent to iput -k
, which means...
-k checksum - calculate a checksum on the data server-side, and store
it in the catalog.
That would mean that the checksum only needs to be calculated on the server side, and it would appear that it uses the hash scheme configured for that server.
What you're looking for, I think, is the equivalent of iput -K
:
-K verify checksum - calculate and verify the checksum on the data, both
client-side and server-side, and store it in the catalog.
This feature uses VERIFY_CHKSUM_KW
to calculate the checksum on the client side, re-calculate it on the server side (using the same hash scheme as was used by the client-side calculation), and then ensures that they match.
You could try using VERIFY_CHKSUM_KW
instead. However, DataObjectManager.put
does not appear to implement the client-side checksum calculation like iput
. My impression is that you can only register a checksum based on a server-side checksum calculation and there's no built-in way to verify the checksum against the local data.
I'll mark this as a bug, but I view it more as a missing feature rather than something not working. We can play with the labels. :)
@d-w-moore - Does that seem right? Am I missing something?
I am sorry, I forgot to copy that command over. Indeed I used:
iput -K hello.txt hello_iput.txt
And the version of the icommands is 4.3.1-0~bionic.
In case this is news - there is a little section on checksums in the README...
https://github.com/irods/python-irodsclient?tab=readme-ov-file#computing-and-retrieving-checksums
@trel What's our milestone to be for this one?
Let's get the remaining issues for 2.1.1 resolved and handle this in 3.0.
Yep
I guess it makes sense for us to respect irods_match_hash_policy
as well.
Let's discush first.
For pre-consideration in discush : I noticed iput
has both -K
(affected by client's default hash scheme) and -k
(not affected), whereas istream has only -k
. This doesn't mean much to me, except perhaps that it's possible the Python iRODS Client "put", being an open/write/close, may like istream write
have different potential capabilities than an iput
. FWIW....
ichksum has -K , and so that and the data object .chksum()
method is probably will probably be more our point of reference - I would hazard a guess.
@chStaiger After some discussion, we landed at the following ...
In your original issue, you're comparing iput
to PRC put. iput
uses the PUT API whereas the PRC put uses open/write/close (i.e. streaming operations). The streaming operations do not support client-side checksum operations like iput
.
You'd need to provide your own implementation for the behavior you're describing.
While transferring data I noticed that the iRODS server uses different hash schemes for the checksums depending on the client I use.
In my irods_environment.json I set the checksum algorithm as below:
On the server
sha256
is the default checksum algorithm.When I use the icommands to upload data, the data is checked by md5 sums:
When I transfer data with the PRC v 2.0.1.
sha2
is used as checksum algorithm:Is there an extra parameter which I have to pass to the PRC to ensure that the data is checksummed by
md5
?