[FR] Bulk download between recent checkpoints?

woodruffw commented 7 months ago

I had this idea while playing around with my own monitoring tool, curious to hear what the Rekor folks think 🙂 -- if you think it's too complicated or otherwise not worth the effort please close!

Description

Right now, a real-time log monitor might have an event loop like this:

Persist the last observed checkpoint
Wait until a new checkpoint appears
Audit all entries in the range [old, new)

To do (3), the monitor calls /api/v1/log/entries/retrieve repeatedly for ranges of indices in [old, new), which each call only handling a maximum of 10 indices. Current typical checkpoint ranges include a few hundred entries, meaning that the retrieval loop takes a decent amount of time (and that monitoring requires more fallible network round-trips than strictly necessary).

My proposal: For the last N checkpoints (pick N to balance size tradeoffs), Rekor could bundle the entries between adjacent checkpoints into singular payloads. These payloads could then be made available via an endpoint like /api/v1/log/entries/retrieve/by-checkpoints (or similar), where the request to that endpoint specifies the checkpoint span.

Pros:

In the "happy" case, this would reduce the order of monitor network requests to Rekor from O(N) to O(1), making the monitor faster and reducing pressure on Rekor (this may not be significant anyways)

Cons:

Additional storage requirements on Rekor's side, along with a small amount of server complexity
In the "sad" case (where a monitor is catching up or missed a checkpoint for whatever reason), the network request order degrades back to O(N). This could be addressed through an even more clever "windowing" approach (where Rekor bundles the entire last N checkpointed entries into one giant payload and offers ranges over it), but this is even more complicated.

TL;DR: Rekor could bundle ranges between pairs of recent checkpoints to accelerate a common monitor retrieval pattern. This would reduce network traffic and improve monitor performance, at the cost of some additional storage and server complexity.

haydentherapper commented 7 months ago

Could this instead be a general purpose batch retrieval API, rather than specifically for checkpointing?

I had started implementation on this awhile ago but didn't get a chance to finish. The only thing to deal with is deciding whether the index you're querying by is the "global" log index, meaning you need to handle cross-shard lookups, or the shard-specific index, meaning you need to specify a tree ID too. I would prefer the latter, though it does make the API look different than the other APIs that are shard-agnostic.

woodruffw commented 7 months ago

Could this instead be a general purpose batch retrieval API, rather than specifically for checkpointing?

I think so, yeah! I emphasized checkpointing above because it's what I was looking at for my hacky monitor, but I see no reason why it needs to be constrained to that 🙂

I would prefer the latter, though it does make the API look different than the other APIs that are shard-agnostic.

That makes sense to me -- my 0.02c is that I don't mind a slightly more complicated/shard-aware client side API if the retrieval performance is worth it!

sigstore / rekor

[FR] Bulk download between recent checkpoints? #2098