Open npadbidri opened 3 months ago
Sorry for the inconvenience! Let us take a look.
cc. @xxchan Can you elaborate on
inconsistent state between the RW Database (durable state) and RW REDIS (in-memory cache)
Didn't get what is RW REDIS (in-memory cache). Does it mean a Redis Sink?
It's source_fragments
in source manager being outdated and not consistent with table fragments info.
Describe the bug
As per discussions with Tianxiao Shen, the Source Manager component in Rising Wave, had stopped scheduling the job, due to inconsistent state between the RW Database (durable state) and RW REDIS (in-memory cache). Thus, we were asked to perform ANY or ALL of the 3 options :
Given that we would have RW Sinks in the tune of 1000s, this anomaly would be catastrophic for Production scenarios. Can you please fix this bug and also let me know, if we could have an Alerting Mechanism for such scenarios.
Error message/log
No errors were reported in our RW Cloud Portal or even programmatically we did not get exceptions. This is most concerning because in a Production scenario when we have about 2000 Sinks running, ALL of them could silently fail, without we being alerted about this.
This primarily means loss of revenue !!!
To Reproduce
No response
Expected behavior
No response
How did you deploy RisingWave?
Rising Wave Cloud
The version of RisingWave
v1.10.0-rc.1-patch-us-west-2-11-type-mismatch-patch
Additional context