graphite-project / carbon

Carbon is one of the components of Graphite, and is responsible for receiving metrics over the network and writing them down to disk using a storage backend.
http://graphite.readthedocs.org/
Apache License 2.0
1.5k stars 490 forks source link

#956 Flow control fixes #957

Open bucko909 opened 3 weeks ago

bucko909 commented 3 weeks ago

See #956.

This is not a complete fix for the aggregator, but does fix the cache, relay and rewriter daemons.

I've tested the cache daemon manually and verified that it only outputs "MetricCache is nearly full" and "MetricCache below watermark" with a sufficiently high cache size. With a small cache size, it will still drop datapoints ("MetricCache is full") with these settings.

The "fix MAX_QUEUE_SIZE to match config file default" commit may be controversial. I'm happy to do whatever with this to get this merged; it just felt wrong when I found it.

bucko909 commented 3 weeks ago

Push just resets to preferred email.

deniszh commented 3 weeks ago

@bucko909 : could you please check failed tests, pretty please? Thanks!

deniszh commented 3 weeks ago

And rebase or conflict fix needed too

bucko909 commented 2 weeks ago

Rebased to fix conflicts. Only meaningful change is the conflict, which was in the middle of these changes, and is added as context in the diff:

$ (FROM=4e105a0; NOW=d7926bd; diff -u5  <(git diff -U10 $(git merge-base $FROM origin/master)..$FROM) <(git diff -U10 $(git merge-base $NOW origin/master)..$NOW))
--- /dev/fd/63  2024-08-29 15:20:54.847098593 +0100
+++ /dev/fd/62  2024-08-29 15:20:54.851098553 +0100
@@ -131,14 +131,14 @@
  # If enabled this setting is used to timeout metric client connection if no
  # metrics have been sent in specified time in seconds
  #METRIC_CLIENT_IDLE_TIMEOUT = None

 diff --git a/lib/carbon/cache.py b/lib/carbon/cache.py
-index cf8b7d1..a7a0c7e 100644
+index 72ccf6c..0304143 100644
 --- a/lib/carbon/cache.py
 +++ b/lib/carbon/cache.py
-@@ -200,20 +200,27 @@ class _MetricCache(defaultdict):
+@@ -201,20 +201,27 @@ class _MetricCache(defaultdict):
              in self.items()]

    @property
    def watermarks(self):
      return [(metric, min(datapoints.keys()), max(datapoints.keys()))
@@ -162,11 +162,11 @@
    def _check_available_space(self):
      if state.cacheTooFull and self.size < settings.CACHE_SIZE_LOW_WATERMARK:
        log.msg("MetricCache below watermark: self.size=%d" % self.size)
        events.cacheSpaceAvailable()

-@@ -244,22 +251,26 @@ class _MetricCache(defaultdict):
+@@ -245,22 +252,26 @@ class _MetricCache(defaultdict):

      return sorted(datapoint_index.items(), key=by_timestamp)

    def store(self, metric, datapoint):
      timestamp, value = datapoint
@@ -180,20 +180,20 @@
          else:
 +          if self.is_nearly_full:
 +            # This will disable reading when flow control is enabled
 +            log.msg("MetricCache is nearly full: self.size=%d" % self.size)
 +            events.cacheFull()
+           if not self[metric]:
+             self.new_metrics.append(metric)
            self.size += 1
            self[metric][timestamp] = value
            if self.strategy:
              self.strategy.store(metric)
        else:
          # Updating a duplicate does not increase the cache size
          self[metric][timestamp] = value

- 
- _Cache = None
 diff --git a/lib/carbon/client.py b/lib/carbon/client.py
 index d9ba5a9..6bed7b7 100644
 --- a/lib/carbon/client.py
 +++ b/lib/carbon/client.py
 @@ -28,20 +28,24 @@ try:
bucko909 commented 2 weeks ago

I think this should fix the tests -- apologies; I just ran pytest but it looks like it didn't find the other two classes in that file.

The fixes are squashed into "[#956] allow instances to go 25% over their hard max".

tox with no parameters also runs a benchmark. The benchmark does not appear to terminate in any reasonable time when performing "Benchmarking unique metric MetricCache drain..." on 1M datapoints. I tested it on master and the same is true, so I've not broken this.

codecov-commenter commented 2 weeks ago

:warning: Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 37.93103% with 18 lines in your changes missing coverage. Please review.

Project coverage is 50.46%. Comparing base (fdc56f6) to head (dbf2925). Report is 17 commits behind head on master.

Files with missing lines Patch % Lines
lib/carbon/service.py 0.00% 9 Missing :warning:
lib/carbon/client.py 16.66% 4 Missing and 1 partial :warning:
lib/carbon/conf.py 0.00% 3 Missing :warning:
lib/carbon/events.py 50.00% 0 Missing and 1 partial :warning:

:exclamation: Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #957 +/- ## ========================================== - Coverage 50.63% 50.46% -0.18% ========================================== Files 36 36 Lines 3446 3474 +28 Branches 535 534 -1 ========================================== + Hits 1745 1753 +8 - Misses 1574 1592 +18 - Partials 127 129 +2 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.