Open woodruffw opened 7 months ago
Could this instead be a general purpose batch retrieval API, rather than specifically for checkpointing?
I had started implementation on this awhile ago but didn't get a chance to finish. The only thing to deal with is deciding whether the index you're querying by is the "global" log index, meaning you need to handle cross-shard lookups, or the shard-specific index, meaning you need to specify a tree ID too. I would prefer the latter, though it does make the API look different than the other APIs that are shard-agnostic.
Could this instead be a general purpose batch retrieval API, rather than specifically for checkpointing?
I think so, yeah! I emphasized checkpointing above because it's what I was looking at for my hacky monitor, but I see no reason why it needs to be constrained to that 🙂
I would prefer the latter, though it does make the API look different than the other APIs that are shard-agnostic.
That makes sense to me -- my 0.02c is that I don't mind a slightly more complicated/shard-aware client side API if the retrieval performance is worth it!
I had this idea while playing around with my own monitoring tool, curious to hear what the Rekor folks think 🙂 -- if you think it's too complicated or otherwise not worth the effort please close!
Description
Right now, a real-time log monitor might have an event loop like this:
[old, new)
To do (3), the monitor calls
/api/v1/log/entries/retrieve
repeatedly for ranges of indices in[old, new)
, which each call only handling a maximum of 10 indices. Current typical checkpoint ranges include a few hundred entries, meaning that the retrieval loop takes a decent amount of time (and that monitoring requires more fallible network round-trips than strictly necessary).My proposal: For the last
N
checkpoints (pickN
to balance size tradeoffs), Rekor could bundle the entries between adjacent checkpoints into singular payloads. These payloads could then be made available via an endpoint like/api/v1/log/entries/retrieve/by-checkpoints
(or similar), where the request to that endpoint specifies the checkpoint span.Pros:
O(N)
toO(1)
, making the monitor faster and reducing pressure on Rekor (this may not be significant anyways)Cons:
O(N)
. This could be addressed through an even more clever "windowing" approach (where Rekor bundles the entire lastN
checkpointed entries into one giant payload and offers ranges over it), but this is even more complicated.TL;DR: Rekor could bundle ranges between pairs of recent checkpoints to accelerate a common monitor retrieval pattern. This would reduce network traffic and improve monitor performance, at the cost of some additional storage and server complexity.