Open mostrecent opened 8 months ago
Just to touch base here, I'm not very sure what Hyperdrive does or doesn't do here. It is definitely highly specific to database drivers, but I'm not sure if there were similar optimizations with connection pooling and whatnot that preexisted implicitly for HTTP connections. I'd suspect they optimized that first and are likely leveraging internal preexisting CF infrastructure to make that optimal. But it's just a hunch on my end.
For us over HTTP, there are a few fundamental differences.
In theory, if fetch() is optimized within CF to persist connections, this should be extremely optimal already. But I can't say since I don' tknow what CF is doing internally.
CF Workers paired with globally read-replicated PS is already fast but I still wonder if you guys do all that handshaking, etc.; I guess you do because your driver is https-based
We do less handshaking as I mentioned, but the handshaking we do is geographically as close as possible. (We're/I'm planning to document and write how we do this very soon here), but PlanetScale is setup like a CDN as well, so the TCP+TLS handshake happens as close to the connection as possible, regardless of where your database is geographically.
So, is already a team at PS working with CF on enabling PS through CF Hyperdrive?
Nope, this is our first hearing about this, but I'm really curious how this compares to the HTTP connection we use, or if it would become more favorable to utilize mysql2
instead purely for performance.
I'd love to do some actual testing and comparisons. I am biased towards our HTTP API and suggest we use it outside of just serverless contexts and brings other improvements and benefits anyways, so if the HTTP path is just as optimal in CF, that'd be ideal. If the MySQL protocol path is getting special treatment and does end up being faster, that'd be really interesting.
I'd love to talk/work with someone at CF to get some concrete answers.
I'd get with PS read-replicas my db nodes closer to CF worker nodes than with Posgres-based providers such as Neon but with latter + Hyperdrive I would save all the comm overhead and could imagine faster perf for most single-query page requests despite having only one location (e.g. US-East) but might be wrong too, so curious about your opinion.
I don't really think Hyperdrive would solve this better than what we do already because of what I mentioned. We do a lot of the same things to achieve similar optimizations, including terminating the handshakes as close as possible. So while there is some chatter for a TCP connection, that chatter should be over a very short distance ideally, and hopefully CF does connection pooling in their infra already implicitly for fetch().
So I'd be curious how this works in practice.
Ok, so https://twitter.com/elithrar/status/1767270517180916059
My hunch here is that our HTTP side is mostly unaffected any what I mentioned stands true. I would suspect, Hyperdrive makes mysql2
library more comparable in performance, but I'm not sure it'd make it better just due to the fundamental differences still.
I'd be curious for some real world benchmarks, but I think this is mostly irrelevant if you're using this driver.
Thanks for the long reply! Yeah, after checking this Hyperdrive a few more times, I think it's about enabling TCP-connected DBs for CF's infra, kind of an turn-key Connect (their generic version of enabling TCP services for CF). And since you guys do already some voodoo over http instead of tcp, we might not need this for PS.
if fetch() is optimized within CF to persist connections
It's not. Their core product "Workers" runs V8 isolates on demand and even if they are still "warm", their lifetime is 10ms (free) or 30sec (paid). But since you have 280 nodes, it's usually always a new Worker responding. The Workers product is actually quite nice, no cold-start, really deployed around the world on all 280 nodes, so no caching and extremely responsive/fast/low latency, usually max 40ms to everyone on this planet (without a db query ofc).
So, as long as one is careful with how many db queries are sent within one user/page request you can live with the latency but it isn't great. PS read-replicas help here but tbf, there're many requests which have to go to the main db, all writes, and reads after writes and you have all the init overhead for each request.
Ok, so https://twitter.com/elithrar/status/1767270517180916059
Thanks for this. Their Smart Placement product or feature should minimize round-trips within one request to the db, which is great but surprisingly, it doesn't work with external dbs but just D1, I just skimmed their docs but this is what I found.
In general, and looking at all the benchmarks and my experience (at some point I tried all DBs you can connect via http to CF), PS is really good and fast. But if you compare it to a local VPS hosting both app and db server where roundtrips are in the single digit ms it's a different story. You can also scale a VPS quite far. Not that this is my favorite setup, don't want to self-host and it isn't HA either but the gap to something like PS is huge.
But coming back to the one crucial point: You say that if my app server could persist the connection, requests would be much faster because we wouldn't have all the init ceremony, right?
Because then, I should host my app server as long running process and close to the main/master of my PS. Since I cannot lock CF Workers to one region and because there are not long running, my only option is a vps running node (or another app server) because every PaaS such as Vercel, Netlify or CF offer short-lived (edge) functions only. Or do I miss something?
it doesn't work with external dbs but just D1, I just skimmed their docs but this is what I found.
Ok, I just checked it again, it does work with ext. DBs and Planetscale but only if the service isn't globally distributed, e.g. if PS doesn't have read-replicas in other regions (which is actually a good feature).
So, I can lock CF Workers just by enabling Smart Placement but only with non-distributed DBs. I'd still suffer from the init overhead on every request with Workers.
One more thing and FWIW, the docs of Hyperdrive say with MySQL coming soon. (source: https://developers.cloudflare.com/hyperdrive/ )
So, there's a version for MySQL coming too. As far as I understood, Hyperdrive is a ded. long-living connection pooler keeping the connection to the db, hence removing the init overhead, and communicating to Workers via websockets TCP sockets.
Which sound not too bad. Reaching out to them might be a good idea. They are usually pretty responsive on their discord.
Edit: In their docs is a good chart:
Before you think this is unrelated, the team maintaining this repo is the closest to this topic and is welcome to comment: CF released Hyperdrive for connecting DB w/o the comm overhead, they say:
By maintaining a connection pool to your database within Cloudflare’s network, Hyperdrive reduces seven round-trips to your database before you can even send a query: the TCP handshake (1x), TLS negotiation (3x), and database authentication (3x). [1]
So, my questions:
More context/motivation: I'd get with PS read-replicas my db nodes closer to CF worker nodes than with Posgres-based providers such as Neon but with latter + Hyperdrive I would save all the comm overhead and could imagine faster perf for most single-query page requests despite having only one location (e.g. US-East) but might be wrong too, so curious about your opinion.
[1] https://developers.cloudflare.com/hyperdrive/get-started/