larrabee / s3sync

Really fast sync tool for S3
GNU General Public License v3.0
476 stars 68 forks source link

s3sync fails with few files saying NoSuchKey: The specified key does not exist. #26

Open prudhvigodithi opened 4 years ago

prudhvigodithi commented 4 years ago

s3sync works with few s3 files and for few, it fails saying NoSuchKey: The specified key does not exist, anything to fix form s3 permission for am I missing any option with s3sync?

error log: ERRO[0000] Sync error: pipeline step: 1 (LoadObjData) failed with error: object: content/directpath/health-check/abd/details/js.properties sync error: NoSuchKey: The specified key does not exist. status code: 404, request id: xxxxxxxxxxxxxx, host id: v5dwefwefwefmkwmfkwa98y79790978ZKpjUNCAkKhE+8697fqbdqtdgbxbx0=, terminating

larrabee commented 4 years ago

Hello. Please check the file (content/directpath/health-check/abd/details/js.properties) permissions through S3 Console. I think your user has no access to this file.
Also you can try to download this file via s3cmd.

Another option is that the file was deleted after it was listed. You can skip this files with option --on-fail skipmissing

prudhvigodithi commented 4 years ago

Hey thanks for reaching me back, I have tried using skipmissing option but that skips the entire content in that folder, however I have checked the permission form s3 console I see read object as yes, read object permission as yes and write object permission as yes, also I have tried with straight aws s3 sync command works well and gets synced to local directory.

WARN[0001] Skip missing object: content/00d9ffsdfsdfmkwckscwd16/15719586868768.bytes WARN[0001] Skip missing object: content/00a329qqhqwdqdqkdkqdiqd2/1mcwkmfwkm2135.properties

Tested with s4cmd works fine:

with s3sync added the following options -

s3-retry 3 --s3-acl private -p -w 128 -f skipmissing --debug

gets the following error:

DEBU[0000] Pipeline step: ListSource finished
DEBU[0000] S3 obj content downloading request failed with error: NoSuchKey: The specified key does not exist. status code: 404, request id: XXXXXXX, host id: XXXXXXX

However, under same s3 bucket, I have tested with another folder sync works:

INFO[0000] Starting sync
DEBU[0000] Listing bucket finished
DEBU[0000] Pipeline step: ListSource finished
DEBU[0000] Pipeline step: LoadObjData finished
DEBU[0000] Pipeline step: ACLUpdater finished
DEBU[0000] Pipeline step: UploadObj finished
DEBU[0000] Pipeline step: Terminator finished
DEBU[0000] All pipeline steps finished
DEBU[0000] Pipeline terminated
INFO[0000] 0 ListSource: Input: 0; Output: 3 (14 obj/sec); Errors: 0 INFO[0000] 1 LoadObjData: Input: 3; Output: 3 (14 obj/sec); Errors: 0 INFO[0000] 2 ACLUpdater: Input: 3; Output: 3 (14 obj/sec); Errors: 0 INFO[0000] 3 UploadObj: Input: 3; Output: 3 (14 obj/sec); Errors: 0 INFO[0000] 4 Terminator: Input: 3; Output: 0 (0 obj/sec); Errors: 0 INFO[0000] Duration: 218.480329ms
INFO[0000] Sync Done

s3sync -verison

VersionId: 2.9, commit: 3f7a732dc2c96b73784151ad0da9b93f2d6f4a98, built at: 2019-10-01T08:00:37Z

prudhvigodithi commented 4 years ago

Hey looks something close to this issue: https://github.com/larrabee/s3sync/issues/24

s3cmd info gives the following output: ERROR: S3 error: 404 (Not Found)

However s3cmd sync works for that directory and files inside it.

s3cmd ls and aws s3 ls lists all the directories and files inside it.

larrabee commented 4 years ago

Issue #24 related to another bug, that has been fixed.
It's very strange that s3cmd info fails. Can you write full cmd line of s3cmd?

prudhvigodithi commented 4 years ago

s3cmd info ouput:

s3cmd info s3://mytests3syncbucket s3://mytests3syncbucket/ (bucket): Location: us-east-1 Payer: BucketOwner Expiration Rule: all objects in this bucket will expire in ' Policy: none CORS: none ACL: AWS-s3syncbucket: FULL_CONTROL

s3cmd info ouput with object:

s3cmd info s3://mytests3syncbucket//content ERROR: S3 error: 404 (Not Found)

Below is the output for aws s3cmd sync for the same file that is failing with s3sync.

s3cmd sync s3://mytests3syncbucket//content/directpath/014c1784/data/test-data-1023 download: 's3://mytests3syncbucket//content/directpath/014c1784/data/test-data-1023/0.bytes' -> '/data/014c1784/data/test-data-1023/0.bytes' [1 of 2] 0 of 0 0% in 0s 0.00 B/s done download: 's3://mytests3syncbucket//content/directpath/014c1784/data/test-data-1023/0.properties' -> '/data/014c1784/data/test-data-1023/0.properties' [2 of 2] 327 of 327 100% in 0s 5.51 kB/s done Done. Downloaded 327 bytes in 1.0 seconds, 327.00 B/s.

s3sync failed with the following error:

s3sync s3://mytests3syncbucket/data/ --s3-retry 3 --s3-acl private -p -w 128

INFO[0000] Starting sync
ERRO[0000] Sync error: pipeline step: 1 (LoadObjData) failed with error: object: content/directpath/014c1784/data/test-data-1023/0.bytes sync error: NoSuchKey: The specified key does not exist. status code: 404, request id: 5776977DF8, host id: S39P6rO+1AgQKp0tfkwfkw gjhuhb6588yb86nAwxzhh24=, terminating INFO[0000] 0 ListSource: Input: 0; Output: 1000 (1952 obj/sec); Errors: 0 INFO[0000] 1 LoadObjData: Input: 1000; Output: 0 (0 obj/sec); Errors: 7 INFO[0000] 2 ACLUpdater: Input: 0; Output: 0 (0 obj/sec); Errors: 0 INFO[0000] 3 UploadObj: Input: 0; Output: 0 (0 obj/sec); Errors: 0 INFO[0000] 4 Terminator: Input: 0; Output: 0 (0 obj/sec); Errors: 0 INFO[0000] Duration: 512.565058ms
ERRO[0000] Sync Failed

prudhvigodithi commented 4 years ago

Somehow I feel s3sync is not picking up files under prefixes with "//" meaning s3://mytests3syncbucket//content, my data is under s3 bucket, under an empty folder and then under content folder so the path created was s3://mytests3syncbucket//content with '//' before content folder, I have moved the same file under the bucket s3://mytests3syncbucket and s3sync worked fine wich failed if it was under s3://mytests3syncbucket//content, so I'm assuming it is throwing NoSuckKey if it was under // folder, correct me if I'm wrong and I have tested this across multiple files and folders under // which didn't work and worked if they are /a/b/c format.

larrabee commented 4 years ago

It's known issue with double slash. It was fixed in latest commit, but build with this version was not released. I create new version with this fix, please try version 2.10 and let me know.

prudhvigodithi commented 4 years ago

Hey thanks a lot for that release, I have downloaded it and tried the s3sync commands still gets the same error Nosuchkey

s3sync -version VersionId: 2.10, commit: e0a4585e08c3f78da0a23deca24d540df01146a4, built at: 2019-11-05T15:54:51Z

larrabee commented 4 years ago

I commit bugfix to debug branch, you can build it. I'm not sure that this bugfix should be committed to master branch, because it may have strange effects in other scenarios.

prudhvigodithi commented 4 years ago

Hey sorry I'm late to update, it's working now, however when I try to run more than 128 processes it breaks, but this double slash bug I suppose it exists in aws cli itself?

larrabee commented 4 years ago

Hello.

  1. Is it failed with Out of memory error? If yes you can increase swap.
  2. I don't know. Double slash object created by incorrect client that not normalize object url.
kannanvr commented 4 years ago

@larrabee , Generally this issue comes , If Source and Destination Bucket User is different... If we add the Bucket Permssion to acess from Source to Destination User, then we can avoid this issue

larrabee commented 4 years ago

@kannanvr, hello. I think it's not related issues. Can you create new issue and provide full cmd line (without keys).