irods / irods_capability_automated_ingest

Other
12 stars 15 forks source link

Add multithreaded read and write from S3 to iRODS for ```PUT``` and ```PUT_SYNC``` #207

Closed avkrishnamurthy closed 1 year ago

avkrishnamurthy commented 1 year ago

Currently, the PUT and PUT_SYNC operations for objects coming from an S3 bucket work using a single-stream read and write into iRODS, which is very slow for large files. Adding a multithreaded way to read the object from the S3 bucket and write it into iRODS will speed this up.

alanking commented 1 year ago

@avkrishnamurthy - Are we waiting for performance numbers before we close this? We can always open a new issue if we find something. The initial implementation seems complete.

avkrishnamurthy commented 1 year ago

I think it can be closed. If something worrying comes from the performance results, it could be addressed in a new issue, or in the issue I will create soon about refactoring sync_irods.py and the scanner. I'll leave it to your judgment whether or not to close/leave it open though.

alanking commented 1 year ago

Agreed. Please close!

trel commented 1 year ago

very nice.