splitgraph / seafowl

Analytical database for data-driven Web applications ðŸŠķ
https://seafowl.io
Apache License 2.0
409 stars 9 forks source link

Exit after not receiving any queries for a specified interval #306

Closed aleda145 closed 1 year ago

aleda145 commented 1 year ago

Hey!

I want to deploy seafowl in a serverless way on fly.io machines. They have support for scaling to zero. This will only happen if the process running inside the machine exits though.

So I'm curious if maybe this feature is something that is on the roadmap?

My use case is a dashboard that is accessed not that often, but when it is accessed it does several queries at once. Would be really cool to let the machine spin down when it's not receiving any requests.

I suppose I could also do this with some shell-script baked into the docker container, that terminates the seafowl process after a while. Feel free to close the issue if you think that is a better solution than adding it to the codebase! 😄

mildbyte commented 1 year ago

Hey, thanks for the idea!

It's not on the roadmap yet but sounds interesting since we definitely want to be able to handle FaaS-like use cases.

I kinda understand Fly's reasoning for letting the actual process exit when it thinks it's idle, but, for apps that only handle HTTP requests, I'd expect them to have a better idea for when to terminate the machine, since they control the ingress HTTP proxy and can both see when the app hasn't served any requests and handle routing/rerouting to a different instance.

How are you intending to start the Fly machine right now when your dashboard emits a query? It looks like something's meant to hit the start machine endpoint and it can't be done from the client side -- is there some backend component/edge function that spins the machine up and forwards the query to it?

aleda145 commented 1 year ago

How are you intending to start the Fly machine right now when your dashboard emits a query? It looks like something's meant to hit the start machine endpoint and it can't be done from the client side -- is there some backend component/edge function that spins the machine up and forwards the query to it?

The machine starts when it gets a HTTP request, so I just send a curl to the fly.io address to start it!

I kinda understand Fly's reasoning for letting the actual process exit when it thinks it's idle, but, for apps that only handle HTTP requests, I'd expect them to have a better idea for when to terminate the machine, since they control the ingress HTTP proxy and can both see when the app hasn't served any requests and handle routing/rerouting to a different instance.

That's very true. I'm pretty sure that Google Cloud Run is handling it like that. Now that I'm thinking about it, I actually have a go web server on Cloud Run that scales to zero after not receiving any requests for 15min~.

I might deploy Seafowl there instead of fly.io machines, less coding so I can focus on the dashboard!

Hope to spend some time hacking on this during the weekend, will get back to this issue if I have something to add 😄

aleda145 commented 1 year ago

Hey again! I got seafowl up and running on Google Cloud Run. Works great there! Startup times are always <1s, really nice!

The guide for setting up cloudflare as a caching mechanism was also easy to adapt to Cloud Run! :star:

I have it deployed on my site for querying swedish real estate prices (1M rows~), here if you want to check out seafowl in the wild: https://bostadsbussen.se/sold/query

One thing that I've noticed with setting it up as a serverless way and exposing it directly to the internet is that it will respond to the OPTIONS pre-flight requests. So even if the query result is fully cached by cloudflare it will still wake up and respond to those, introducing some extra latency if it's sleeping.

Hard to get around that when sending GET requests though!

mildbyte commented 1 year ago

Hey, great to see it's working for you!

The CORS thing is strange: we're meant to send an Access-Control-Max-Age header with the CORS preflights, but it looks like Cloudflare treats it as a DYNAMIC response and doesn't cache it:

image

I'll dig around and see if it's possible to get Cloudflare to cache CORS preflights.

aleda145 commented 1 year ago

Awesome! Let me know if you need anything from me, my project is closed source but I can share the cloudrun terraform code or my cloudflare settings if it helps!

mildbyte commented 1 year ago

Looks like Cloudflare doesn't cache OPTIONS requests by design at all. Not sure I agree with that (since browsers are happy to cache it).

I think the options are:

I'll close this issue for now and move the discussion of that to https://github.com/splitgraph/seafowl/issues/55.