jamsocket / y-sweet

A standalone yjs server with persistence to S3 or filesystem.
https://docs.y-sweet.dev
Other
421 stars 27 forks source link

Cloudflare Worker y-sweet server is causing huge runtime cost. #203

Open timeswind opened 8 months ago

timeswind commented 8 months ago

The y-sweet Cloudflare worker is using Cloudflare Durable Object WebSocket connection feature. But since y-sweet server is build on rust, the rust porting api for Cloudflare worker Durable Object doesn't have a the latest Hibernatable WebSockets API which use a custom acceptWebsocket to replace the origin accept function.

The origin accept function of WebSocket feature would causing long-span Cloudflare durable object runtime fee and is not suitable for large-scale usage(actually even single WebSocket connection would cause large runtime fees, which I tested on my personal Cloudflare account, triggering the next-tier pricing charge very easy)

https://developers.cloudflare.com/durable-objects/api/websockets/#acceptwebsocket

paulgb commented 7 months ago

Hi @timeswind,

Unfortunately the hibernate API even if available would be nontrivial to implement, because only ~2MB of data can be kept in-memory across hibernation, which is small for a Yjs document. We could potentially spread the document across multiple keys in the Durable Object KV, but it would involve adding a bunch of Cloudflare-specific code which is the opposite direction I'd like to go (especially given how long a correctness bug we found has gone completely unacked).

I'm curious what prices you're seeing. My math is that a DO held open for a whole day costs about 14 cents. Are you operating a scale where that is untenable, or am I doing my math wrong?

timeswind commented 7 months ago

@paulgb Thanks for your reply, I was amazed when I saw the y-sweet project which could save developer a lot of pain to building somethings close to "Figma" experience and it really fit the use-case the Cloudflare Durable Object promised.

Following the instructions I deployed a test project just by modifying the auth_key and use a yjs&prosemirror React component to verify. Everything worked until I saw the abnormal usage from Cloudflare dashboard.

CleanShot 2024-02-10 at 22 21 56@2x CleanShot 2024-02-10 at 22 21 49@2x

At first I am not sure what really caused the issue, after looking into the docs from Cloudflare then I realized the missing implementation of hibernate API for workerd-rs could be the problem. As you can see the statistic above, the test environment only with few requests could end up with millions of Durable Object GB-sec duration.

paulgb commented 7 months ago

Hmm, I don't think the lack of hibernate is the issue here. Using hibernate (even if it were available on the wasm side) would require that the in-memory state be frozen and re-hydrated on every WebSocket request, which is not a good fit for a stateful service like y-sweet.

I can't make sense of these numbers though, I think either:

  1. These 7 requests were each WebSocket connections to unique durable objects and were held open for on average 2.7 days (!), in which case the total duration incurred is correct
  2. There is a bug in Cloudflare's metering where duration is incurred even after the WebSocket is closed
  3. I am either reading the docs wrong or doing my math wrong
image

If you think it's 2, it might be worth bringing up with Cloudflare. I'm certainly open to the possibility that it's 3, but I've checked my work and it seems to match Cloudflare's own calculation method.