treeverse / lakeFS

lakeFS - Data version control for your data lake | Git for data
https://docs.lakefs.io
Apache License 2.0
4.46k stars 359 forks source link

Upload Object: Align lakeFS object mtime with block adapter mtime as much as possible #8328

Closed N-o-Z closed 3 weeks ago

N-o-Z commented 3 weeks ago

lakeFS mtime and storage mtime may vary due to several reasons. Changes in metadata for example where the object is updated in lakeFS only can cause these values to be completely different. Recently we've encountered issues with Databricks and S3GW where objects which had a lakeFS mtime < storage mtime would fail batch jobs. This is because Databricks started using IfUnmodifiedSince header in their request to validated objects were not modified during job. The failure results from Databricks reading mtime from lakeFS and then sending request directly to S3 using IfUnmodifiedSince with that value.

An initial fix to lakeFS was done ensuring creation time of object in lakeFS is calculated only after object was uploaded to storage. This fix seems insufficient especially for small objects as small skews in clock between lakeFS server and storage service could result in the aforementioned state.

The proposed solution is to provide object mtime or server time from the put object response as possible to avoid this situation.

arielshaqed commented 3 weeks ago

Here's another one! It gives an incorrect time in half the cases on the S3 gateway -- and makes lakeFS respond with a Last-Modified time that is 1 second before the S3 Last-Modified header. Even when they have the same time.

Fortunately the fix for this one is easy enough, I have an easy reproduction.

Intrigued? This is for a 50-MiB object that I uploaded in order to test the fix for #8311:

❯ lakectl -c ~/.lakectl.s3.yaml fs stat lakefs://sample1/boo/xyzzy/foo 
Path: xyzzy/foo
Modified Time: 2024-10-30 16:24:15 +0200 IST
Size: 4 bytes
Human Size: 4 B
Physical Address: https://treeverse-ariels-test-us.s3.us-east-1.amazonaws.com/repos/sample1/data/gcmc37nqmpl146vlicg0/csh433vqmpl146vlicig%2CCPR8TZMSXp2nLbbuIjptMNjUtQAuLBkAxpqqiyKOomM?DELETED-THIS-STUFF
Physical Address Expires: 2024-10-31 08:49:35 +0200 IST
Checksum: d3b07384d113edec49eaa6238ad5ff00
Content-Type: application/octet-stream
❯ http --head  get 'https://treeverse-ariels-test-us.s3.us-east-1.amazonaws.com/repos/sample1/data/gcmc37nqmpl146vlicg0/csh433vqmpl146vlicig%2CCPR8TZMSXp2nLbbuIjptMNjUtQAuLBkAxpqqiyKOomM?DELETED-THIS-STUFF' 
HTTP/1.1 200 OK
Accept-Ranges: bytes
Content-Length: 4
Content-Type: binary/octet-stream
Date: Thu, 31 Oct 2024 06:34:48 GMT
ETag: "d3b07384d113edec49eaa6238ad5ff00"
Last-Modified: Wed, 30 Oct 2024 14:24:16 GMT
Server: AmazonS3
x-amz-id-2: nEw8qFHG08fkV8KNgAB9oYpw0lSLZx4JYDXgnhiduu9BU1qLeZHcnRhJ0Vnv/tWKAdYg6zn/SiQ=
x-amz-request-id: DHT8D2M278J2QBXD
x-amz-server-side-encryption: AES256

The Last-Modified date from lakeFS is 1 second before the same date on AWS. This looks like rounding error when converting accurate time.Time to 1-second precision for Last-Modified 🤦🏽

N-o-Z commented 3 weeks ago

The fix in question is for put object only, and not for MPU. I think we should separate the issues and the fixes