prometheus-community / pro-bing

A library for creating continuous probers
MIT License
309 stars 52 forks source link

Fix memory bloat for unresponded to packets #33

Open TheRushingWookie opened 1 year ago

TheRushingWookie commented 1 year ago

Description

Separates out packet ID tracking from the main Ping object into IDGenerator. This allows IDGenerator to keep an LRU based set of buckets of UUIDs and evict old buckets of UUIDs if needed.

It allows users to fix a memory bloat issue where pinging an address which never responds will cause IDs used inflight to be accumulate forever.

When using the LRU deletion policy, if we hit the max number of used IDs then the last bucket of IDs is deleted. This does cause us to not be able to track dropped buckets of IDs for deduplication or per packet timeouts. I think this is a worthwhile tradeoff for users that need this.

Tests

Added unit test for a server which never responds to pings Modified existing tests to use IDGenerator correctly

Compatibility

This introduces no new APIs or dependencies and does not change overall behavior by default. If a deletion policy is used, it may prevent the tracking of duplicate packets.

SuperQ commented 1 year ago

This is similar to the work I was doing in https://github.com/prometheus-community/pro-bing/pull/9.

One thing tho, you're not doing any locking, so this is not going to be thread safe as-is.

TheRushingWookie commented 1 year ago

@SuperQ Yup this is sort of similar #9 , I took a look at that before starting this one. I think these two PRs are trying to solve different things but there are common parts.

9 is trying to separate out id generation/tracking so that timeouts + deleting can work. I think this is great. Per packet timeouts will be very useful.

This PR is trying to solve id generation/tracking to fix only memory bloat. It doesn't really care about timeouts but would allow other code to retire packets in the future. You could imagine per packet timeouts being added on top of this by attaching a wrapper around the returned ID or having a separate structure track per packet timeouts per ID.

TheRushingWookie commented 1 year ago

Reading some more of the code, it looks like we don't need locks for IDGenerator as its only accessed from the runLoop goroutine. Is there a different reason to add locks?

SuperQ commented 1 year ago

There's no way to predict how this might be used later. Any time you make code that updates a shards struct like this, better safe than sorry.

TheRushingWookie commented 1 year ago

We know how its being used currently and control how it will be used later. I don't think its best to try to predict all futures for a library and think that the library should be stricter within itself than how its used publicly

Within the package I think its better not to have locks between each subcomponent as locks are not free and we know exactly how the subcomponents work with each other. Adding locks is also not a great thing in golang as the default mutexs are not reetrant safe and can easily cause deadlocks if the components do not know the threading.

Currently, the threading is nice and simple with all logic contained within one thread (runloop) and the icmprecving thread passing it to through a channel to runloop. I'm actually really happy that the icmprecving thread passed it through a channel to runloop instead of trying to grab locks and update the structures itself.

Locking at the public API level is different and great as it means APIs are now threadsafe and no one has to tack locks on when using the API in a way the maintainers can't predict

SuperQ commented 1 year ago

Yes, exactly, Next(), Retire(), and RemoveAfterMaxItems() are public API functions. They need locking protection if they mutate the state.

TheRushingWookie commented 1 year ago

Oh wow, yeah I forgot to make these private to within the library. They should not be called by external users!

Does this seem like the right direction for a PR? If not, let me know and I'll wait for the refactor. I won't have as much time in the next few weeks to contribute to pro-bing so I'm trying to see if I can get this in before I have to stop working with pro-bing

TheRushingWookie commented 1 year ago

I've updated the PR to make all the APIs private

TheRushingWookie commented 1 year ago

@SuperQ Can I get a review of this? I will probably not be able to work more on pro-bing starting next wednesday

TheRushingWookie commented 1 year ago

Still looking for a review of this

SuperQ commented 1 year ago

Sorry, I've been traveling and have a bit of a review backlog.

TheRushingWookie commented 1 year ago

Hi, could I get a look at this when you have some time?