Closed thegreatfatzby closed 3 months ago
it'll be required for every operation including updates and deletes.
For reasons such as 1. the pubsub-based data delivery is not in-order 2. internal optimization for file-based data reading may make reading out-of-order, we need to depend on some client-defined time to determine a deterministic order, rather than some server run-time decision.
I see, so micro and macro question:
Also @truemike and @swapnilpandit
Will this be required per row? Or could it somehow be inferred per batch, like from a file, a file name, or other metadata? Yes. It is required per row.
Technically it could also infer from elsewhere but we try to keep things simple unless there is a strong reason. Given that we already have 2 ways to ingest data (pubsub, fs) and 2 data formats (Avro, Riegeli), and we may have more ways in the future, we want to keep the feature matrix as simple as possible.
Will this type of loading be the only type supported?
We're open to suggestions but this is the only type supported as of now. We design within the constraints of TEE, which does not persist data across machine restarts, which makes it really hard to make the KV server as the source of truth of the data, for 1. decisions made by the KV server cannot persist across restarts without great complexity 2. consensus algorithm to make such decisions is also hard due to the constraints so it's much easier for each server to operate independently. Therefore it's much cleaner to let the client control this aspect. I suspect the other caching/storage solutions don't need to worry about this as much as we do. But it's always nice to find some inspiration.
Will the commit time field be required for updates or deletes?