Open noelforte opened 1 year ago
Thank you for the very detailed report.
Can you please run the following code and paste the outputs of the two calls here:
import { S3SyncClient, ListLocalObjectsCommand, ListBucketObjectsCommand } from 's3-sync-client';
const client = new S3SyncClient({ /* your config */ });
console.log(
await client.send(
new ListLocalObjectsCommand({
directory: 'output',
})
)
);
console.log(
await client.send(
new ListBucketObjectsCommand({
bucket: 'my-bucket',
prefix: 'path/to/directory',
})
)
);
The diff code for updates is pretty simple:
if (
sourceObject.size !== targetObject.size ||
(options?.sizeOnly !== true &&
sourceObject.lastModified > targetObject.lastModified)
) {
updated.push(sourceObject);
}
Let's see if the issue comes from values or maybe value types.
Sure thing, here's the local object output, truncated for brevity:
[
LocalObject {
id: 'test-obj-a.jpg',
size: 1667804,
lastModified: 1685735624000,
isExcluded: false,
path: 'output/test-obj-a.jpg'
},
LocalObject {
id: 'test-obj.b.jpg',
size: 385869,
lastModified: 1685735634000,
isExcluded: false,
path: 'output/test-obj.b.jpg'
}
]
and the bucket object output:
[
BucketObject {
id: 'test/test-obj-a.jpg',
size: 1667804,
lastModified: 1685735624935,
isExcluded: false,
bucket: '...',
key: 'test/test-obj-a.jpg'
},
BucketObject {
id: 'test/test-obj-b.jpg',
size: 385869,
lastModified: 1685735766762,
isExcluded: false,
bucket: '...',
key: 'test/test-obj-b.jpg'
}
]
Looks like the lastModified
values of the local files are returning rounded down to 1000 seconds.
I've made tests on S3, and it seems that AWS doesn't store milliseconds for the LastModified field.
Ref: https://github.com/aws/aws-cli/issues/5369
My test with official AWS SDK commands:
await s3Client.send(
new PutObjectCommand({
Bucket: BUCKET_2,
Key: 'def/jkl/xmoj',
Body: Buffer.from('0x1234', 'hex'),
})
);
console.log(
(
await s3Client.send(
new ListObjectsV2Command({
Bucket: BUCKET_2,
Prefix: 'def/jkl/xmoj',
})
)
).Contents.map(({ LastModified }) => LastModified.getTime())
);
// => [ 1685740748000 ]
console.log(
(
await s3Client.send(
new GetObjectCommand({
Bucket: BUCKET_2,
Key: 'def/jkl/xmoj',
})
)
).LastModified.getTime()
);
// => 1685740748000
Can you run the last two commands on test/test-obj-a.jpg
? s3Client
here is S3Client
instance from the official SDK.
I have the feeling your provider (or the official AWS SDK) might return inconsistent timestamps between ListObjectsV2Command and GetObjectCommand, which would explain the issue.
You are correct, that is the case. Here's the output:
console.log(
(
await clientS3.send(
new ListObjectsV2Command({
Bucket: env.S3_BUCKET,
Prefix: 'test/test-obj-a.jpg',
})
)
).Contents.map(({ LastModified }) => LastModified.getTime())
);
// => [ 1685735624935 ]
console.log(
(
await clientS3.send(
new GetObjectCommand({
Bucket: env.S3_BUCKET,
Key: 'test/test-obj-a.jpg',
})
)
).LastModified.getTime()
);
// => 1685735624000
Is there anything that can be done to work around that by disregarding the milliseconds if they are returned?
I'm not sure we can safely round or truncate values. If you look at test-obj.b.jpg
in https://github.com/jeanbmar/s3-sync-client/issues/53#issuecomment-1574246333, we get 1685735634000 and 1685735766762 dates while sizes are the same.
I would suggest opening a ticket with the providers and in the meantime using the sizeOnly: true
option when doing sync. Size comparison should be good enough.
Whoops! That was a mistake on my part, I think I changed something in https://github.com/jeanbmar/s3-sync-client/issues/53#issuecomment-1574246333 that caused the times to shift (test-obj-a
vs test-obj.a
), so that's where the inconsistency came from. After the last test I did in https://github.com/jeanbmar/s3-sync-client/issues/53#issuecomment-1574665098 and made sure that the local objects and remote objects were identical, this is the output:
[
LocalObject {
id: 'test-obj-a.jpg',
size: 1667804,
lastModified: 1685735624000,
isExcluded: false,
path: 'output/test-obj-a.jpg'
},
LocalObject {
id: 'test-obj-b.jpg',
size: 385869,
lastModified: 1685735766000,
isExcluded: false,
path: 'output/test-obj-b.jpg'
}
]
[
BucketObject {
id: 'test/test-obj-a.jpg',
size: 1667804,
lastModified: 1685735624935,
isExcluded: false,
bucket: 'my-bucket',
key: 'test/test-obj-a.jpg'
},
BucketObject {
id: 'test/test-obj-b.jpg',
size: 385869,
lastModified: 1685735766762,
isExcluded: false,
bucket: 'my-bucket',
key: 'test/test-obj-b.jpg'
}
]
As you can see, the timestamps for each object are exactly the same apart from the milliseconds, so it doesn't appear to be an issue with the provider.
Environment NodeJS v18.16.0 macOS 13.4 Ventura
Steps to reproduce Sample code for how I'm invoking
s3-sync-client
(with sensitive values stripped):Show
```js // Initialize env import 'dotenv/config'; // Load internal modules import path from 'node:path'; import { env, exit } from 'node:process'; // Load external modules import { S3SyncClient } from 's3-sync-client'; // Initialize client const { sync } = new S3SyncClient({ region: ***, endpoint: ***, forcePathStyle: false, credentials: { accessKeyId: ***, secretAccessKey: ***, }, }); const results = await sync( `s3://my-bucket/path/to/directory`, 'output', { del: true, } ); console.log(results); ```Expected result Items that are unchanged from the remote to the local system should not be recopied.
Actual result Even after an initial successful sync to the local filesystem,
s3-sync-client
continues to redownload files that haven't changed, incurring additional bandwidth charges.Here's a screen capture of the network requests going across:
Show
![Screen Recording 2023-06-01 at 8 14 17 PM](https://github.com/jeanbmar/s3-sync-client/assets/2560683/e55e7d6b-d5bd-49e4-9fe9-1425a46b3e6d)And the resulting output:
Show
```js { created: [], updated: [ BucketObject { id: 'dim-gunger-UO2hOHLq9Y0-unsplash.jpg', size: 1667804, lastModified: 1685662029901, isExcluded: false, bucket: '***', key: 'test/dim-gunger-UO2hOHLq9Y0-unsplash.jpg' }, BucketObject { id: 'luka-verc-D-ChPtXJhXA-unsplash.jpg', size: 2448935, lastModified: 1685662029901, isExcluded: false, bucket: '***', key: 'test/luka-verc-D-ChPtXJhXA-unsplash.jpg' }, BucketObject { id: 'planet-volumes-6tI9Fk5p4bo-unsplash.jpg', size: 385869, lastModified: 1685662029923, isExcluded: false, bucket: '***', key: 'test/planet-volumes-6tI9Fk5p4bo-unsplash.jpg' }, BucketObject { id: 'the-cleveland-museum-of-art-AiD3Pkwmtt0-unsplash.jpg', size: 3881833, lastModified: 1685662030480, isExcluded: false, bucket: '***', key: 'test/the-cleveland-museum-of-art-AiD3Pkwmtt0-unsplash.jpg' }, BucketObject { id: 'yannick-apollon-rYXkqDZxfaw-unsplash.jpg', size: 13356953, lastModified: 1685662030513, isExcluded: false, bucket: '***', key: 'test/yannick-apollon-rYXkqDZxfaw-unsplash.jpg' } ], deleted: [] } ```Other items of note:
s3-sync-client
when performing the diff is the same time as what's on the filesystem and on the WebUI for the S3 storage services so any drift differences, if they exist, aren't visible at least to my eyes as an end-user.Happy to provide any other relevant details!