performancecopilot / pcp

Performance Co-Pilot
https://pcp.io
Other
964 stars 235 forks source link

Prevent peak memory usage due to buffered Redis requests (run pmFetchArchive from discover.c/process_logvol in a worker thread) #1341

Open andreasgerstmayr opened 3 years ago

andreasgerstmayr commented 3 years ago

pmproxy is based on libuv, which works as depicted in this diagram: http://docs.libuv.org/en/v1.x/design.html#the-i-o-loop

When a filesystem change event happens, the discovery code starts and submits Redis requests - up to 66500 Redis requests after starting a single pmlogger. These requests are getting queued up until the next loop iteration. After the filesystem change callback is done, the libuv loop continues from the start and runs the Redis callbacks, which actually sends all Redis requests over the network (and processes any incoming Redis replies).

This pattern is problematic when there are multiple pmloggers, each writing to the disk at a similar time. All filesystem change events are run, i.e. thousands of Redis requests will be buffered (and occupying lots of memory) until they get sent in the next libuv loop iteration.

imho the problem is that the discovery code blocks the libuv loop, and doesn't let the Redis client process any requests or replies. This can be solved by either having a dedicated libuv loop (in a different thread), or partitioning the discovery function, so that not a single callback is blocking the loop, but that the discovery happens in parts.

I have some rough example code ready for the latter solution, i.e. splitting each loop iteration in series_cache_update in a new libuv timer callback (https://github.com/andreasgerstmayr/pcp/commit/27220a0bc2ee56eb63edd3421a28aabd74bfadb9). However, currently the discovery code is free'ing the pmResult after invoking pmDiscoverInvokeValuesCallBack, so running the code async will result in memory access of free'd memory (process_logvol in discover.c, and likely similar issues in other parts).

andreasgerstmayr commented 3 years ago

Another (probably better) solution would be to offload the synchronous, blocking discovery code (calling into libpcp) into a worker thread. The Redis client functions will need to be called from the main thread however (they aren't threadsafe).

natoscott commented 3 years ago

Another (probably better) solution would be to offload the synchronous, blocking discovery code (calling into libpcp) into a worker thread.

My understanding is that this is what's supposed to be happening - pmproxy is multi-threaded due to the discovery code (the libuv code automatically creates threads for it under the covers, it's not something we explicitly do) - no other reason IIRC. Not sure how that fits in with Redis requests not being sent from the main thread though.

goodwinos commented 3 years ago

If pmDiscoverInvokeValuesCallBack was responsible for pmFreeResult (instead of process_logvol), then running the code async should work fine - you can throttle the redis requests however you like without having to worry about use-after-free issues. Ditto for the other metadata callback handlers.

andreasgerstmayr commented 3 years ago

pmproxy is multi-threaded due to the discovery code (the libuv code automatically creates threads for it under the covers, it's not something we explicitly do) - no other reason IIRC.

I just did some debugging - after starting pmproxy, we only have 2 threads: the main thread and one for avahi. The moment I'm running any pmapi REST API call, libuv creates 4 worker threads. Setting a breakpoint before and after uv_queue_work() in webapi.c verifies that this function creates the worker threads. Currently the discovery code runs in the main thread however.

Not sure how that fits in with Redis requests not being sent from the main thread though.

uv_queue_work() has two callbacks, one callback runs on the worker thread, and the after_work_cb callback runs on the main thread. Afaics in the latter callback we need to run the Redis requests. This still fires ~33k Redis requests in one go at startup, but - I think - they won't be accumulated as they are now. Currently, with 10 pmcd containers on my local host, we accumulate approx 330k inflight Redis requests, because each archive is processed sequentially, and libuv doesn't process Redis requests in between. If the discovery of each archive runs in a worker thread, I think it will. Needs testing though.

If pmDiscoverInvokeValuesCallBack was responsible for pmFreeResult (instead of process_logvol), then running the code async should work fine - you can throttle the redis requests however you like without having to worry about use-after-free issues. Ditto for the other metadata callback handlers.

+1, or we just wrap the pmResult in a struct with a refcount, then the individual functions don't need to know when it is safe to free the data.