Closed avkrishnamurthy closed 1 year ago
In the future, please leave off the pounds. We only add them once the changes have been reviewed and approved.
The reason for this is that if more changes are needed due to the review, pushing more commits causes the issue to be flooded with info about those new commits. So leaving the pounds off guards against that.
if this PR is based on top of #134 then we should just add new commits, to preserve credit.
Other than reaffirming @trel's statement above, this otherwise seems fine. I gather that it is a temporary substitute for the "complete" PUT from S3 solution we discussed yesterday, which will ultimately be multi-read/multi-write (NxN threads) from S3 to iRODS ?
Other than reaffirming @trel's statement above, this otherwise seems fine. I gather that it is a temporary substitute for the "complete" PUT from S3 solution we discussed yesterday, which will ultimately be multi-read/multi-write (NxN threads) from S3 to iRODS ?
Yes, this is just the initial naive solution using just single stream to stream.
and i'm happy to get this merged as stream-to-stream, and then we can optimize in a new issue.
We are working on getting a branch which preserves the original authorship and builds on top of it. Stand by...
oooh, git school is in.
Force-push seems to have been a success.
@avkrishnamurthy - As you address review comments, please click "Resolve conversation" so that we don't have to think about them anymore. If something was not satisfactorily addressed, we will start a new conversation. Thanks!
Getting closer...
Added changes
This looks fine to me as long as it has been tested and performs as expected.
Everything looks good to me!
Perhaps we squash this down to 2-3 commits (omitting #s for now) and have one last look.
squashed
Looks good to me. Let's add the #s
I added the pounds to the 2 commits and to the PR title. Let me know if I need to change anything else.
This fixes the "File not found error" that occurs when trying to run a PUT or PUT_SYNC for files from an S3 resource. I am using Minio to get the file from the S3 bucket. Currently can not use parallelization to do the put for S3 objects, which will need to be worked on. Not sure what to do with that at this point. I have also included checksum comparison with the Etag for both normal files in an S3 bucket and also multipart upload files. I am not sure if this is the correct way to do the checksum, so feedback there would be good. The default chunk size for multipart uploads to Amazon S3 is 8 MB, but I have read online that it may not always be the same, so I have added a command line argument to the ingest job to specify a different chunk size.