Closed charlesdunbar closed 4 years ago
I know this exact problem. This is the patch on master that deals with it by querying all caches: https://github.com/graphite-project/graphite-web/commit/48bbfbe073df7852625b9462907ac56f9d65a297
Please note that it's only for carbon.* metrics - it's processed in special way.
@cbowman0 - Thanks for the quick response! I'll look into applying that patch.
@deniszh - I think it's only carbon.agent.* metrics - happy to rename the issue for clarity.
Follow up question - is there any place to track when/if master gets released to a version? Just noticed how long ago that patch was committed. Is master what 0.10 is going to be, or always just bleeding edge?
Until now, master was not released, it's always bleeding edge. All releases were done from 0.9.x branch. But next major release will be 1.0 from master branch, still not clear when though.
Hello @charlesdunbar, We tagged 1.0.0-rc1 from master now, please test it
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Using 0.9.15, but the issue may exist on master as well.
When running multiple carbon-caches on a machine, the carbon-cache metrics have the potential to not be queried from memory, only what's on disk. This leaves some blank points on the graph until carbon-cache flushes the carbon metrics to disk.
An example I'm running into is trying to find the metric "carbon.agents.graphite-be2-prod-b.cache.size". From what I can understand from the carbon-cache code and using manhole, these carbon metrics live in the MetricCache of the specific cache, in this case cache:b.
The issue is those metrics don't appear when accessing the data via the web interface. JSON output of the relevant timeframe:
I've configured CARBONLINK_HOSTS to include every cache_query_port of my local machine, 16 instances in this case.
Doing a tcpdump of the localhost interface and all of those ports only shows 7702 (cache:h) being accessed when performing the query. Since cache:h doesn't actually have any carbon-cache metrics for cache:b, I assume that's why I'm seeing nulls.
Using carbonate and
carbon-lookup
, I see that the hash ring is expecting that metric to exist on cache:h, which is why I see it being accessed via a tcpdump.The issue appears to be https://github.com/graphite-project/graphite-web/blob/0.9.15/webapp/graphite/render/datalib.py#L116-L118 is used to determine which cache to query, which works as expected for every metric except the carbon-cache metrics, which are special and not routed like every other metric.
I don't use carbon-relay or carbon-aggregate, but it looks to only be a carbon-cache issue. lib/carbon/instrumentation.py calls
cache.MetricCache.store(fullMetric, datapoint)
for a carbon-cache metric, while relay and aggregate useevents.metricGenerated(fullMetric, datapoint)
.Not sure if the correct path is to query the specific instance for carbon-cache metrics in datalib.py, or if cabon-cache metrics in instrumentation.py should also use events.metricGenerated to get the metric routed.