splitgraph / seafowl

Analytical database for data-driven Web applications 🪶
https://seafowl.io
Apache License 2.0
390 stars 9 forks source link

Caching queries #416

Closed davidgasquez closed 1 year ago

davidgasquez commented 1 year ago

Hey there! I wanted to run some benchmarks with Seafowl and wasn't sure how to make Seafowl hit the query cache.

My seafowl.toml looks like this:

[object_store]
type = "s3"
access_key_id = "123"
secret_access_key = "123"
endpoint = "https://123.r2.cloudflarestorage.com"
bucket = "name"

# Store the catalog on the local disk
[catalog]
type = "sqlite"
dsn = "seafowl.sqlite"

# Configure the HTTP frontend
[frontend.http]
bind_host = "127.0.0.1"
bind_port = 8080
read_access = "any"
write_access = "123"

I've run the following query to create a table:

curl -i -H "Content-Type: application/json" -H "Authorization: Bearer 123" localhost:8080/q -d@- <<EOF
{"query": "CREATE EXTERNAL TABLE tripdata \
STORED AS PARQUET \
LOCATION 'https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2022-01.parquet';
CREATE TABLE tripdata AS SELECT * FROM staging.tripdata;
"}
EOF

And am querying it like this:

curl -H "Content-Type: application/json" \
  -H "Authorization: Bearer 123"\
  http://localhost:8080/q -d@-<<EOF
{"query": "CREATE TABLE new_table (key INT, value VARCHAR)"}
EOF

Is it expected for Seafowl to reach out to the object store with the same exact query or am I missing some configuration to save the results in a cache?

gruuya commented 1 year ago

Hi, good question! I'm assuming you're not referring to HTTP caching given that you're running Seafowl locally (in case you do, here are the docs that describe that scenario: https://seafowl.io/docs/guides/querying-cache-cdn).

There is an additional caching option, which is to persist the objects in a local file system cache; this feature is a bit hidden in the docs though: https://seafowl.io/docs/reference/seafowl-toml-configuration#object_storecache_properties-section

Here's an example of that option in our integration tests: https://github.com/splitgraph/seafowl/blob/main/tests/statements/mod.rs#L71-L80

Does that help?

(Also note that we keep some benchmarks in a Observable notebook, though I believe that uses a local FS object store to begin with.)

davidgasquez commented 1 year ago

Yes! I think I was looking for something like object_store.cache_properties will try that out @gruuya. :smile:

Thanks a lot!