graphite-project / carbon

Carbon is one of the components of Graphite, and is responsible for receiving metrics over the network and writing them down to disk using a storage backend.
http://graphite.readthedocs.org/
Apache License 2.0
1.51k stars 490 forks source link

Carbon-cache process timeout #787

Closed rgleme closed 6 years ago

rgleme commented 6 years ago

I have the following environment:

ps axuf | grep carbo
root 9860 0.0 0.0 103244 824 pts/0 S+ 17:27 0:00 _ grep carbo nginx 7552 25.4 0.3 297792 87424 ? R Jun21 429:13 /usr/bin/python /opt/graphite/bin/carbon-relay.py start nginx 7557 5.9 0.0 303144 17504 ? Sl Jun21 99:30 /usr/bin/python /opt/graphite/bin/carbon-cache.py --instance=1 start nginx 7563 5.4 0.0 303308 17588 ? Sl Jun21 91:29 /usr/bin/python /opt/graphite/bin/carbon-cache.py --instance=2 start nginx 7569 5.9 0.0 303612 17920 ? Sl Jun21 100:58 /usr/bin/python /opt/graphite/bin/carbon-cache.py --instance=3 start nginx 7575 5.5 0.0 303236 17504 ? Sl Jun21 93:19 /usr/bin/python /opt/graphite/bin/carbon-cache.py --instance=4 start nginx 7581 5.4 0.0 303580 17912 ? Sl Jun21 92:07 /usr/bin/python /opt/graphite/bin/carbon-cache.py --instance=5 start nginx 7587 6.2 0.0 303520 18024 ? Sl Jun21 105:21 /usr/bin/python /opt/graphite/bin/carbon-cache.py --instance=6 start

/opt/graphite/conf/carbon.conf

[relay] ENABLE_UDP_LISTENER = True USER = nginx LINE_RECEIVER_INTERFACE = 0.0.0.0 LINE_RECEIVER_PORT = 2003 PICKLE_RECEIVER_INTERFACE = 0.0.0.0 PICKLE_RECEIVER_PORT = 2004 MAX_QUEUE_SIZE = 10000 MAX_DATAPOINTS_PER_MESSAGE = 1000 RELAY_METHOD = consistent-hashing REPLICATION_FACTOR = 1 DESTINATIONS = 127.0.0.1:2014:1, 127.0.0.1:2024:2, 127.0.0.1:2034:3, 127.0.0.1:2044:4, 127.0.0.1:2054:5, 127.0.0.1:2064:6

[cache] ENABLE_LOGROTATION = True USER = nginx MAX_CACHE_SIZE = inf MAX_UPDATES_PER_SECOND = 25000 MAX_CREATES_PER_MINUTE = 21500 LOG_UPDATES = True GRAPHITE_URL = http://myurl.intranet/ LINE_RECEIVER_INTERFACE = 127.0.0.1 PICKLE_RECEIVER_INTERFACE = 127.0.0.1 CACHE_QUERY_INTERFACE = 127.0.0.1 LOG_CACHE_QUEUE_SORTS = True CACHE_WRITE_STRATEGY = sorted WHISPER_AUTOFLUSH = False USE_WHITELIST= False

[cache:1] MAX_CACHE_SIZE = inf MAX_UPDATES_PER_SECOND = 25000 MAX_CREATES_PER_MINUTE = 21500 LINE_RECEIVER_PORT = 2013 PICKLE_RECEIVER_PORT = 2014 CACHE_QUERY_PORT = 7012

[cache:2] MAX_CACHE_SIZE = inf MAX_UPDATES_PER_SECOND = 25000 MAX_CREATES_PER_MINUTE = 21500 LINE_RECEIVER_PORT = 2023 PICKLE_RECEIVER_PORT = 2024 CACHE_QUERY_PORT = 7022

[cache:3] MAX_CACHE_SIZE = inf MAX_UPDATES_PER_SECOND = 25000 MAX_CREATES_PER_MINUTE = 21500 LINE_RECEIVER_PORT = 2033 PICKLE_RECEIVER_PORT = 2034 CACHE_QUERY_PORT = 7032

[cache:4] MAX_CACHE_SIZE = inf MAX_UPDATES_PER_SECOND = 25000 MAX_CREATES_PER_MINUTE = 21500 LINE_RECEIVER_PORT = 2043 PICKLE_RECEIVER_PORT = 2044 CACHE_QUERY_PORT = 7042

[cache:5] MAX_CACHE_SIZE = inf MAX_UPDATES_PER_SECOND = 25000 MAX_CREATES_PER_MINUTE = 21500 LINE_RECEIVER_PORT = 2053 PICKLE_RECEIVER_PORT = 2054 CACHE_QUERY_PORT = 7052

[cache:6] MAX_CACHE_SIZE = inf MAX_UPDATES_PER_SECOND = 25000 MAX_CREATES_PER_MINUTE = 21500 LINE_RECEIVER_PORT = 2063 PICKLE_RECEIVER_PORT = 2064 CACHE_QUERY_PORT = 7062

Overall, it works as expected. Although, for some reason, every day on the same hour, it stops to process whisper files to Graphite.

If I attach a strace to the carbon cache process, they're receiving time out

[pid 7559] select(0, NULL, NULL, NULL, {1, 0}) = 0 (Timeout) [pid 7559] select(0, NULL, NULL, NULL, {1, 0}) = 0 (Timeout) [pid 7559] select(0, NULL, NULL, NULL, {1, 0}) = 0 (Timeout) [pid 7559] select(0, NULL, NULL, NULL, {1, 0}) = 0 (Timeout) [pid 7559] select(0, NULL, NULL, NULL, {1, 0}) = 0 (Timeout) [pid 7559] select(0, NULL, NULL, NULL, {1, 0}) = 0 (Timeout) [pid 7559] select(0, NULL, NULL, NULL, {1, 0}) = 0 (Timeout)

After about 1 hour, everything is back as normal, and Timeout stops.

Have you seen any os these ? Could you help me ?

deniszh commented 6 years ago

Please check your cronjobs, looks like some periodic process (e.g. locate) blocking carbon from working