Closed dkulikovsky closed 8 years ago
Sounds interesting, what is the size of your relays destination queues. Is it growing while the memory usage raises? I see something similar with carbon-relay (master branch) not freeing memory after it was used for queueing.
Number of queued datapoints is growing while memory usage raises. I did add some debug logging and i saw that daemons do add new points to queue. I set queue limit for daemons to 1000 points and this doesn't help. Daemon hits the limit, drops datapoints but still uses Gbs of memory after a while.
19.11.2013, 16:23, "Marco Hoyer" notifications@github.com:
Sounds interesting, what is the size of your relays destination queues. Is it growing while the memory usage raises? I see something similar with carbon-relay (master branch) not freeing memory after it was used for queueing.
— Reply to this email directly or view it on GitHub.
I reduced the number of aggregation-rules to 500 (was 1000) and memory usage dropped twice, the bug is somewhere there.
Only a wild guess: could it be a case of catastrophic backtracking in one of your regular expressions in aggregation-rules causing things to back up?
On Tue, Nov 19, 2013 at 10:30 AM, dkulikovsky notifications@github.comwrote:
I reduced the number of aggregation-rules to 500 (was 1000) and memory usage dropped twice, the bug is somewhere there.
— Reply to this email directly or view it on GitHubhttps://github.com/graphite-project/carbon/issues/185#issuecomment-28804964 .
rules are pretty simple
one_min.agg.test995 (60) = avg one_min.agg_test.test995.m*
one_min.agg.test996 (60) = avg one_min.agg_test.test996.m*
one_min.agg.test997 (60) = avg one_min.agg_test.test997.m*
one_min.agg.test998 (60) = avg one_min.agg_test.test998.m*
one_min.agg.test999 (60) = avg one_min.agg_test.test999.m*
one_min.agg.test1000 (60) = avg one_min.agg_test.test1000.m*
I think i have found a leak, cache in rules can grow unattended, disabling it fixes the problem:
--- lib/carbon/aggregator/rules.py 2013-11-19 23:05:48.632684971 +0400
+++ /var/lib/graphite/lib/carbon/aggregator/rules.py 2013-11-19 23:00:44.158481765 +0400
@@ -93,7 +93,7 @@
except:
log.err("Failed to interpolate template %s with fields %s" % (self.output_template, extracted_fields))
- self.cache[metric_path] = result
+ #self.cache[metric_path] = result
return result
def build_regex(self):
We have a similar and constant memory leak in the relay when using persistent hashing. On a relay getting about 200k metrics per minute and taking a bit of time to write to disk (network storage with a bit of delay on IOPs), we have to restart relays every 12h to free the memory, and when I am saying free, we are talking about 20G leaked in a day.
We also have an issue when using consistent hashing like FRLinux. Our environment is sending ~150k metrics every 10 seconds, which is a relatively light load for the node.
The relay executes the hash without much queuing, but the memory consumption spikes through the roof (~5GB) within 1 hour after relay restart. We overrun memory and OOM well under the 12h that FRLinux is seeing.
@dkulikovsky, would the ideal fix for this be a configuration item to limit the size of the cache, and logic to purge LRU items from the cache?
It seems like the rate of memory growth would depend on how often "new" metrics are seen by the aggregator. I'm getting millions of metrics every 5 minutes, and very slow (but non-zero) growth in relay and aggregator memory usage. @frlinux, are you often seeing "new" metrics?
Hey @jamesjuran, we don't have any real new metrics if by new you mean that did not exist before and had to be created. The growth in carbon-relay in persistent hashing mode is consistent and grows linearly. For instance, on about 280.000 metrics per minute, carbon-relay in hashing mode will grow up to 10 gigs in memory in 6 hours. Restarting the relay purges the memory back. We have that behavior on all relays using hashing. Our setup is all 0.9.10 though so going to test that theory with 0.9.12 that i grabbed today.
I'm using 0.9.12 + a few patches (none of which should have anything to do with memory consumption). Hopefully your upgrade helps.
Thanks, will report whence I am done. Hoping to test this tomorrow. I just need to look at the new way to configure the different carbons.
Including a graph of my latest test, pushing 500k metrics per minute. First graph to the left is carbon cache memusage, middle one is the relay memusage, as you can see consistently growing and last is the number of metrics pushed.
Over an hour, carbon-relay is consistently hogging memory in hashing mode. Here is my configuration for a given cache and the relay that talks to all caches:
[cache:1] USER = apache LINE_RECEIVER_INTERFACE = 0.0.0.0 LINE_RECEIVER_PORT = 2003 PICKLE_RECEIVER_INTERFACE = 0.0.0.0 PICKLE_RECEIVER_PORT = 2004 CACHE_QUERY_INTERFACE = 0.0.0.0 CACHE_QUERY_PORT = 7201 MAX_CACHE_SIZE = inf MAX_UPDATES_PER_SECOND = inf WHISPER_AUTOFLUSH = True LOG_UPDATES = False
[relay:1] RELAY_METHOD = consistent-hashing LINE_RECEIVER_INTERFACE = 0.0.0.0 LINE_RECEIVER_PORT = 2013 PICKLE_RECEIVER_INTERFACE = 0.0.0.0 PICKLE_RECEIVER_PORT = 2014 MAX_QUEUE_SIZE = 10000 MAX_DATAPOINTS_PER_MESSAGE = 500 USE_FLOW_CONTROL = True DESTINATIONS = localhost:2004:1,localhost:2104:2,localhost:2204:3,localhost:2304:4,localhost:2404:5,localhost:2504:6,localhost:2604:7,localhost:2704:8
And if you switch to rule based relay and just configure the single cache what do your graphs look like?
@drawks we have that in production, rule based does not leak, only relay in hashing mode. It has been there for a while, 0.9.10 and today tested 0.9.12 latest pull from github to see if that fixed it. Both code base have that issue.
are your tests running against a system without any metrics on it yet? If you are blocking on file creates it'll balloon up a bit. I'm not seeing any unbounded growth using CH or aggregation aware CH here. FWIW
So yes, all metrics were created on the spot but this is on RAID SSD, took only a few minutes to create them all. From thereon, each minute sees an update to the 500k metrics, not a new metric creation. As you can see on my config above, there is no throttling of any sort.
What version of python are you on? I'm similarly on fast SSD without any throttling configured. I haven't seen evidence of a memory leak, but I have seen problems with a large number of connections does seem to prevent the relay/aggregator from draining. What does the
carbon.relays.*.destinations.*.sent
graph look like?
Here is a graph of the different relaysdestinations
python2.7 CentOS 6.4 across the board.
So does the sum of sent track the number received? I'm confused as to what your 4 graphs are. 0.9.x should only have "sent", "queueUntilReady", "fullQueueDrops" and "attemptedRelays".
This patch fixes this problem:
--- a/lib/carbon/aggregator/rules.py
+++ b/lib/carbon/aggregator/rules.py
@@ -93,7 +93,7 @@ class AggregationRule:
except:
log.err("Failed to interpolate template %s with fields %s" % (self.output_template, extracted_fields))
- self.cache[metric_path] = result
+ if result: self.cache[metric_path] = result
return result
def build_regex(self):
Story: first of all i decided that disabling any cache is not really good idea, so i've tried to find the correct way to solve this problem. First attempt was with redis, i moved all cache to redis database and set up the connection pool for carbon daemons, this didn't help obviously. Storing and fetching data in redis is much more expensive than working with memory. But while debugging values that was stored to redis, i found out that most of the values where None. Metric name as key and value is None. In my second attempt i've tried to fix cache usage. So i have returned to the native code and added some debugging for cache and the result was the same as with redis, tons of None values stored in cache. The patch above fixes cache usage by limiting values to be stored only to not None. I have profiled code with my patch and without it and it seems to be that storing None values in cache to make this function skip some metrics really fast doesn't reduce any cpu usage. So my patch shouldn't reduce capacity.
@dkulikovsky nice. Am I right to think that a similar patch is needed for relay though?
In megacarbon branch this patch only needed where aggregator or relay with aggregated-consistent-hashing is used, both of them use rules.py to calculate destinations for metrics.
I have applied that fix but this does not fix it for me, restarted a few hours ago and still grows rapidly as before (see graph)
This patch works only in megacarbon branch.
Thanks for the patch, although I'm not sure if it fixed anything for us. But I've applied it anyway :)
This patch greatly improves memory usage when using relays and aggregators with aggregator-consistent-hashing in 0.9.12, but it still doesn't solve the issue for me.
After processing about 1M metrics, 2 relays + 2 aggregators grew until eating away about 2.4GiB RAM total, which is not that much problematic on its own. The problem is that the whole system isn't receiving any metrics right now, and the memory hasn't been freed.
Any pointers to what causes this?
I believe the root cause is an unbounded dict that is used as a cache for the mapping between metric name and the aggregation metric that it belongs to. The PR above adds an LRU cache for this instead of a naked python dict, with a new setting to tune it.
This change bounds the memory use at the expense of possibly more CPU (when there's a cache miss, you have to evaluate the aggregation rules).
I can confirm that this is indeed the root cause. I implemented a similar LRU cache using lru-dict and solved the unbounded memory growth, but I haven't gotten it working well enough to submit.
@jamesjuran We've been running with this fix for 2 days with pretty good results:
To keep python 2.6 compatibility, you can add a requirement for the ordereddict package.
I can confirm that too.
Server is crashing because 100% of memory is used by python. What did you do to fix this please?
@jamesjuran @bruce-lyft Anyone have a patch suitable for a PR?
Has be2e4f0d919b4cbbaf6743eebde4e41e3c775e73 improved things for anyone?
I'm not sure a bounded LRU cache makes a lot of sense in this case. Say we have a relay with this problem:
If we have an LRU cache that can hold 100,000 metrics, we cache the first 100k results, the next 100k boots those last 100k from the cache, the next 100k boots the prior... Basically we just eat memory and CPU and never get the benefit of the cache.
I can see some scenarios where perhaps a bounded cache might be some benefit, e.g. there is a wide distribution of metric frequencies sent through (some each at 1s, 10s, 60s, 10m, whatever) but that doesnt seem like the typical cases carbon relay/aggregator sees.
I also wonder how much help this cache actually is - perhaps someone with a rather large aggregator rule set could try removing it to see what happens to CPU usage?
I never responded to that thread, apologies. The issue we had on our setup was with relays using hashing. And it was a bug in the python packages we used, which had a memory leak on the hashlib (https://lists.fedoraproject.org/pipermail/scm-commits/2012-July/832476.html)
After patching this, we never got a problem on relays, rock stable since as long as you do not feed them too many metrics.
Never used carbon aggregator so cannot comment on that one.
@mleinart The problem with the fix in be2e4f0 is that memory use is still unbounded. In our case, we have new metrics coming in on a regular basis, with a short time horizon (collectd/statsd metrics that are per-host, for ephemeral hosts, but which are aggregated; for example, CPU average for a group of hosts). With only your change we see carbon-aggregator memory use grow without bound and eventually crash. I believe that a change like the PR I submitted (https://github.com/graphite-project/carbon/pull/335) is still required.
@mleinart @bruce-lyft Would it make sense to offer something like #335 as an optional strategy?
I believe the aggregator leak in #185 is almost certainly the result of feeding aggregated metrics back into the aggregator loop as described in #455 and #560. There's a proposed fix in #439 but it needs some TLC before we can merge it. There's also an external fix applied to https://github.com/criteo-forks/carbon/commit/825693c97e57c7cdc36d0b0d758694010ce2be53 but I'd like to get feedback from others before we consider either of these.
Note that I'm not seeing any memory leaks in relays running at moderate volume (100k points/sec).
/cc @mleinart @drawks @iain-buclaw-sociomantic
Adding to my previous comment, I believe that any previous relay memory leaks have already been fixed in master. The aggregator "leak" that @bruce-lyft describes is a consequence of the buffer dictionary memory footprint as it grows to accommodate the maximum concurrent number of buffers. Any existing memory should be reused, and my tests corroborate this hypothesis.
I agree with @mleinart's sentiment about the LRU cache not being the right approach. This would short-circuit Carbon's intended design and effectively trade memory for CPU consumption.
I have some thoughts about expiring metric buffers but I'm not sure how to tackle that yet.
Unless anyone has a reproducible case for memory leaks that aren't a natural byproduct of datapoints being stored in Python dictionaries, I propose that we close this issue as resolved. I acknowledge that there are times where Carbon memory could grow unbounded (e.g. MAX_CACHE_SIZE = inf
) but these are tunable options (e.g. MAX_CACHE_SIZE
, MAX_QUEUE_SIZE
) and Carbon provides us with the internal statistics needed to identify these conditions before they become fatal.
I've got a leaking megacarbon instances. The load is low - 50-100k metrics per minute. Config is simple - i have a pipeline 2 relays with aggregated-consistent-hashing -> 1 aggregator with consistent hashing -> 2 writers. Instances that run relay or aggregation role consumed tons of memory:
That happens really fast, just after restarting megacarbon daemons and starting perf test tool, in first minute relay and aggregator consume about 720Mb of memory each, in the second minute memory usage grows to 3Gb each. After 20-30 min procs consume 6-11Gb each, and so on. Top output above is taken after 10hours after restarting carbon daemons (perf test lasted for 30min and then all these daemons works only on data it generates by itself). If RELAY_METHOD is set to relay-rules for relay instances - everything is fine, memory usage is about 30-50Mb, but for aggregator this doesn't help at all.
I've tried to catch this memory leak with different tools, but i didn't succeed.
Current config for relay is
for aggregator: