treeverse / lakeFS

lakeFS - Data version control for your data lake | Git for data
https://docs.lakefs.io
Apache License 2.0
4.36k stars 348 forks source link

Research: use `Cache-Control` when interacting with underlying storage #4627

Open johnnyaug opened 1 year ago

johnnyaug commented 1 year ago

Much of the data written by lakeFS to the underlying storage is immutable, including physical paths of actual data, and Graveler ranges and metaranges. Consider the option to add the Cache-Control header to the objects.

This way, clients can cache these objects.

john-zielke-snkeos commented 1 year ago

When deciding on which directive could be set, the Mozilla docs provide some info. As a widely supported value, max-age with a high value (e.g. one year) makes sense in addition to something like "immutable"

arielshaqed commented 1 year ago

I agree, it would sounds a useful addition for lakeFS with a remote object store, or for serving objects off of lakeFS to a remote client that uses direct access! But I want us to be cautious here.

@john-zielke-snkeos, I completely understand your frustration about our caution here, and I share it to some extent. I believe this belongs on the issue, so copying it over there. So let me try to explain the forces that act against this. The basic issue is that it's a very broad front.

On the server side it's "just a small matter of programming 😝": We would need to add support to lakeFS, of course -- and possibly also for the GCS and Azure adapters too.

But now S3 is not a single well-defined implementation, it's a de facto standard that has no definition. And we want to support it wherever possible... I am not even sure which implementations to pick to verify that this works, or how to do so: Not even if I relax the requirement from "sends back the same Cache-Control header" to "doesn't do something unexpectedly bad when it sees a Cache-Control header". As a cautious developer, that translates to adding and documenting another option. Which translates to increased cognitive load for our users.

github-actions[bot] commented 10 months ago

This issue is now marked as stale after 90 days of inactivity, and will be closed soon. To keep it, mark it with the "no stale" label.