linkedin / venice

Venice, Derived Data Platform for Planet-Scale Workloads.
https://venicedb.org
BSD 2-Clause "Simplified" License
487 stars 84 forks source link

[Feature] Venice Push Status System store needs to have a stronger guarantee since it is a critical piece in Push Job #657

Open ZacAttack opened 1 year ago

ZacAttack commented 1 year ago

Willingness to contribute

No. I cannot contribute a bug fix at this time.

Feature Request Proposal

Right now, Venice Push Status System Store creation seems to be a best effort in prod and some stores have it and some stores are not.

But polling DaVinci Push Status System Store is in the critical path of push job offline status polling and if push status system store is not there, the push job will never succeed.

One idea is to have a monitoring service in Controller to periodically check the healthiness of the push status system store for each Venice store and fix it automatically.

Motivation

What is the use case for this feature?

This is to improve write path availability.

Details

No response

What component(s) does this bug affect?

ZacAttack commented 4 months ago

In the controller on addVersion, check to make sure the push status system store for the store exists, and if it doesn't exist, try to create it, and if that fails, fail the addVersion call in the controller to abort the push.