spiffe / spire

The SPIFFE Runtime Environment
https://spiffe.io
Apache License 2.0
1.8k stars 474 forks source link

Filter cache update reads to only read each entry once, when an entry has two events in the same refresh cycle. #5349

Closed edwbuck closed 2 weeks ago

edwbuck commented 3 months ago

This is a small optimization of the db event framework for a specific scenario.

When an entry is modified twice within a polling cycle, we currently handle each modification (event) independently. This means that the cache updating read of the entry will read the entry twice, once for the first update and once for the second update.

As the first update's read would likely read the record as it exists after the second update had been applied, an optimization of reading the entry once, instead of twice, should provide the same outcomes, with one less record combing back in the database request.

For this to occur, we would have to take the list of the entries to be updated, and deduplicate them. A number of relatively fast methods exist to do this, so the cost to do this might be smaller than the cost of returning a duplicate record. However, if we ever want fidelity on "count of events processed" tracking to "count of cache updates" this optimization will break 1 to 1 event tracing.

amoore877 commented 3 months ago

For this to occur, we would have to take the list of the entries to be updated, and deduplicate them. A number of relatively fast methods exist to do this, so the cost to do this might be smaller than the cost of returning a duplicate record.

yeah my initial thoughts was just select unique, but I guess we have to be careful about making sure we can tie back all retrieved event IDs too. Like if 10 event IDs all touched one entity, we only really care that some time later we retrieved the data for that one entity.

like if I have

missedAgentEventsMap: {
"1": agentAID
"5": agentBID
"10": agentAID
} 

and it's time to look up missed events, then first thing that happens (assuming something like #5342 is addressed):

select * where eventID in (1, 5, 10)

which returns back, if successful, a struct essentially equivalent to the missedAgentEventsMap above

missedEventsFound: [
{event: "1", agentID: agentAID},
{event:"5", agentID: agentBID},
{event:"10", agentID: agentAID},
]

we de-dupe this, but I was thinking into something like an inverse of the original map:

agentIDsToQueryMap: {
agentAID: [1, 10],
agentBID: [5],
}

next we query the agent table for each key in the query map above. if all cache update ops are successful for agentBID, we remove 5 from missedAgentEventsMap if all cache update ops are successful for agentAID, we remove both 1 and 10 from missedAgentEventsMap

sorindumitru commented 2 months ago

I think this might already be handled. There's a seenMap used during the polling: https://github.com/spiffe/spire/blob/4f34e4388078ace465d73c324b60835f0a680396/pkg/server/endpoints/authorized_entryfetcher_registration_entries.go#L122 It looks like we only poll the db the first time we see an entry/node id.

edwbuck commented 2 months ago

@sorindumitru I wish it was already deduplicated, but there will be a small code change to deduplicate the query. What you are seeing is that "change event" being deduplicated. The issue described above is when two different change events refer to the same EntryID, the EntryID should be fetched only once.

This will involve a map, and a second loop. We process the items once (as above) to see if we skip items, and to add them to a fetching map, instead of fetching the directly.

Then we fetch from the fetching map, which will be keyed by the EntryID (not the EventID) which will de-duplicate the fetching.

I'm working on code for this now, and will have a PR ready soon.

sorindumitru commented 2 months ago

Isn't it done by, entry id at the moment? So if the same entry id is seen twice in an update we'll only fetch the entry once from the database, see for example https://github.com/spiffe/spire/blob/4f34e4388078ace465d73c324b60835f0a680396/pkg/server/endpoints/authorized_entryfetcher_registration_entries.go#L140

azdagron commented 2 weeks ago

I agree, @sorindumitru. Seems that we're already deduplicating the handling of events for a given entry/agent. It is however, scoped to only events retrieved from normal event polling (e.g. the List*Events functions). It is possible that a given entry/agent ends up getting polled twice in the same cycle if there is both a "skipped event" successfully polled, and one or more regular events for that entry/agent, though I'd expect that to not be something that happens frequently.

edwbuck commented 2 weeks ago

I agree, @sorindumitru. Seems that we're already deduplicating the handling of events for a given entry/agent. It is however, scoped to only events retrieved from normal event polling (e.g. the List*Events functions). It is possible that a given entry/agent ends up getting polled twice in the same cycle if there is both a "skipped event" successfully polled, and one or more regular events for that entry/agent, though I'd expect that to not be something that happens frequently.

Seen events only handled the "before the first" detected event. It never handled other areas of event updates. It existed primarily so the selection range in the "before the first" events could be filtered for "before the first" events that had already been processed.

Besides, the events aren't the problem, it is the Entries and Nodes that are to be fetched that needed de-duplicate. The title has a typo, as one doesn't get any event twice, they get two different events that reference the same Entry or Node, where the Entry or Node was processed "twice".