graphite-project / carbon

Carbon is one of the components of Graphite, and is responsible for receiving metrics over the network and writing them down to disk using a storage backend.
http://graphite.readthedocs.org/
Apache License 2.0
1.51k stars 490 forks source link

carbon-cache errors with "dictionary changed size during iteration" on carbon 1.1.4 #822

Closed standaloneSA closed 5 years ago

standaloneSA commented 5 years ago

I haven't completely narrowed down what is changing, but after running properly for some time, carbon-cache is failing with an unhandled error:

01/11/2018 10:52:31 :: Unhandled Error
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/twisted/python/threadpool.py", line 250, in inContext
    result = inContext.theWork()
  File "/usr/local/lib/python3.5/dist-packages/twisted/python/threadpool.py", line 266, in <lambda>
    inContext.theWork = lambda: context.call(ctx, func, *args, **kw)
  File "/usr/local/lib/python3.5/dist-packages/twisted/python/context.py", line 122, in callWithContext
    return self.currentContext().callWithContext(ctx, func, *args, **kw)
  File "/usr/local/lib/python3.5/dist-packages/twisted/python/context.py", line 85, in callWithContext
    return func(*args,**kw)
--- <exception caught here> ---
  File "/opt/graphite/lib/carbon/writer.py", line 189, in writeForever
    writeCachedDataPoints()
  File "/opt/graphite/lib/carbon/writer.py", line 98, in writeCachedDataPoints
    (metric, datapoints) = cache.drain_metric()
  File "/opt/graphite/lib/carbon/cache.py", line 187, in drain_metric
    metric = self.strategy.choose_item()
  File "/opt/graphite/lib/carbon/cache.py", line 116, in choose_item
    return next(self.queue)
  File "/opt/graphite/lib/carbon/cache.py", line 104, in _generate_queue
    metric_counts = sorted(self.cache.counts, key=lambda x: x[1])
  File "/opt/graphite/lib/carbon/cache.py", line 161, in counts
    return [(metric, len(datapoints)) for (metric, datapoints) in self.items()]
  File "/opt/graphite/lib/carbon/cache.py", line 161, in <listcomp>
    return [(metric, len(datapoints)) for (metric, datapoints) in self.items()]
builtins.RuntimeError: dictionary changed size during iteration

From what I can tell, this may be an incompatability with Python 3 ( per https://stackoverflow.com/questions/11941817/how-to-avoid-runtimeerror-dictionary-changed-size-during-iteration-error ).

The machine is Ubuntu 16.04, with carbon 1.1.4 installed from pip.

If I can give any other information on the machine or environment, please let me know. Thanks!

--Matt

DanCech commented 5 years ago

It sounds like counts and watermarks in _MetricCache may need to use with self.lock: to avoid thread-safety issues.

piotr1212 commented 5 years ago

I guess your cache size grows which results in that part of the code being slower and the change of the data being modified in another thread larger. Should be easy fixable with adding a list() around self.items() so mimic python2 behaviour, but from reading your link this would mean the whole cache is copied which is quite inneficient. Think adding a lock might better.

ma-tty commented 5 years ago

I`ve also faced the issue.

piotr1212 commented 5 years ago

The fix was insufficient, issue still present.

deniszh commented 5 years ago

Fix released in 1.1.5