Improving usability of subscription

hzxa21 commented 3 months ago

[x] Change the default behavior of DECLARE cursor_name SUBSCRIPTION CURSOR FOR subscription_name to SINCE now() without backfilling historical data.
[x] Change op column type from int16 to varchar to make it more understandable: 1 -> insert, 2 -> update_insert, 3 -> delete, 4 -> update_delete
[x] When waiting for new subscription data to come, change the FETCH behavior from returning empty result to blocking (with an optional timeout) until new data arrives. This makes user easier to develop their application in event-driven manner. #18107
[ ] When user session is active (i.e. client-FE connection is alive), automatically retry querying log store on retryable errors, including cluster recovery, query stream timeout, transient network error between FE and CN.
[ ] more msg with show cursors https://github.com/risingwavelabs/risingwave/pull/18217#discussion_r1730629473

Feel free to post more ideas under this issue.

hzxa21 commented 3 months ago

When user session is active (i.e. client-FE connection is alive), automatically retry querying log store on retryable errors, including cluster recovery, query stream timeout, transient network error between FE and CN.

Recently we found that if user declare a cursor but doesn't fetch it frequently, it may cause the query stream remain valid but unpolled for a long time, which may causes the storage epoch being pinned for a long time. We may extend the above idea to actively shutdown idle query stream and re-create one from the previous pos in log store when the cursor is fetched again.

lmatz commented 2 months ago

https://docs.risingwave.com/docs/current/subscription/#persisting-the-consumption-progress requires users to persist in the progress by themselves. We can let RW handle this internally by exposing an "ack" function to users.

In terms of what RW does underneath the "ack",

cur.execute("INSERT INTO subscription_progress (sub_name, progress)", (sub_name, progress, progress))
cur.execute("FLUSH")

I wonder if there could be cases (e.g. barriers already pile up in the system?) where "flush" takes a non-negligible period of time, which makes "exactly-once delivery" and getting the latest results at low latency via subscribe not achievable at the same time.

risingwavelabs / risingwave

Improving usability of subscription #18208