splitgraph / seafowl

Analytical database for data-driven Web applications 🪶
https://seafowl.io
Apache License 2.0
409 stars 9 forks source link

Allow authenticated cache-friendly query #300

Closed tv42 closed 1 year ago

tv42 commented 1 year ago

https://seafowl.io/docs/guides/querying-cache-cdn says says

If you disable Seafowl reads or make them password-protected by setting read = "password"/"off", this will also disable the cached GET endpoint.

There should be no need for that. If you add Authorization to Vary, the cache should behave correctly with regard to different/missing authorization headers.

mildbyte commented 1 year ago

@tv42 Hey, thanks for the suggestion!

We tried adding it in the above PR but noticed that Cloudflare doesn't respect the Vary header, so there are two options we have, according to the docs

We could also figure out some signature-in-URL scheme to make sure the intermediate caches use the read password as a cache key without revealing it, e.g. by making the request /q/query-hash-or-urlencoded-query?signature=HMAC(query, read_password), but it would be interesting to find out your use case / preferences for this.

tv42 commented 1 year ago

Well that Cloudflare behavior sucks :-(

I think one could work around it by explicit use of the Cache API in a Cloudflare Worker, but that does mean more setup needed.

I'm mostly looking at running an internal-only dashboard. No anonymous access, but caches would be good.

HMAC-in-URL would work, but that means you have secret URLs, which tends to be a bad idea.

Feel free to close this issue as can't-do-easily. If I feel the need strongly enough[1], I'll write the Cloudflare Worker to explicitly cache, it should be about 15 lines.

[1]: My dashboard is using Postgres for now, as I'm trying to reuse existing software and didn't want to spend the effort in writing a client adapter for that one client project.

milesrichardson commented 1 year ago

[1]: My dashboard is using Postgres for now, as I'm trying to reuse existing software and didn't want to spend the effort in writing a client adapter for that one client project.

FYI, I've been working on https://github.com/splitgraph/madatdata which is a TypeScript library for querying and managing SQL databases (so far including Seafowl and Splitgraph, but with an interface that makes it easy to add plugins for other databases). It's WIP/alpha stage right now, but Seafowl integration is working. Here is an example of using Seafowl in a Next.js app. Also note the readme is outdated, but I'm updating it today (or I'm planning to, at least).

(no idea if that's relevant to you or what language you're writing the dashboard in)

gruuya commented 1 year ago

For now I've enabled the GET endpoint to work even when read access is restricted (including authz enforcement). It still won't get cached in shared caches (but it will in private ones), so it is effectively a compromise.

I will re-evaluate other options of getting the shared caches to cache authz-ed responses (besides vary-ing but Authorization), given that we plan on changing our cached GET API anyway: #55