irods-contrib / irods-cloud-browser

DFC Web Based cloud browser
BSD 2-Clause "Simplified" License
18 stars 13 forks source link

on file upload, checksum is not stored in iCAT #182

Open carsten-jahn opened 8 years ago

carsten-jahn commented 8 years ago

I'm using cloud browser 1.0.1 to upload to iRODS 4.1.8. The core.re rules, e.g. acPostProcForPut, were not modified (i.e. they do not include a call to msiSysChksumDataObj).

I thought with a default jargon.properties being present in the iRODS cloud browser class path (from WEB-INF/lib/jargon-core-4.0.3.1-SNAPSHOT.jar), the existing setting transfer.computeandvalidate.checksum=true would cause a client (=Jargon) checksum to be transferred to the iRODS server during upload. I also assume that iRODS will put it in the catalog then, visible with ils -L. But I don't see a checksum for files uploaded with iRODS cloud browser. I was expecting a behavior similar to iput -K, which does the end-to-end check and also stores the checksum in iCAT.

Is this an issue in iRODS cloud browser, Jargon, or in my understanding? :smiley:

michael-conway commented 8 years ago

Hi Jon, that setting is used and put/get transfers, and uploads use the distinct streaming i/o protocol. That's why it didn't compute. However, your observation makes me think it would be a nice feature to add to streaming i/o and I can add that capability.

carsten-jahn commented 8 years ago

Hi Mike, sorry I don't really understand this... is this setting just applicable in certain situations? Thanks!

michael-conway commented 8 years ago

Yes, it was applicable to the put/get transfers (analogous to iput and iget). In iRODS the streaming i/o (open stream, write to stream, close stream) is a distinct code path and doesn't have checksum 'knobs' as part of that protocol.

However, I can observe the setting by wrapping a streaming i/o operation from cloud browser with checksum logic, and I think that's a nice feature. We'll also be adding a compute checksum action in the browser itself in an upcoming feature release.

carsten-jahn commented 8 years ago

Thanks, now I get it. The cloud browser uses streaming API because it already has a stream, from the web upload.

Of course it would be nice to improve this for the cloud browser as you suggest - but isn't it possible to checksum a stream just as well directly in the Jargon API? And the iRODS server feature that receives a stream could checksum what it received as well. I don't know if it does though. So if its possible, it would be great to have this directly in Jargon. If not, please add a small "best practice" documentation snippet about the wrapping approach for cloud browser, so that other developers can use it as well.

michael-conway commented 8 years ago

Yes, checksumming of a stream as it 'passes' is in Java already, so we just need to add a hook to use that. We can turn this issue into that feature and do it down in Jargon, that was exactly my thought. Your original point is correct that the behavior of streams versus put/get violates the principle of least surprise.

MC

carsten-jahn commented 8 years ago

Sounds great! Can we move the issue into the Jargon project then?

michael-conway commented 8 years ago

yes I will do that this morning

carsten-jahn commented 8 years ago

created issue: https://github.com/DICE-UNC/jargon/issues/194