graphite-project / carbon

Carbon is one of the components of Graphite, and is responsible for receiving metrics over the network and writing them down to disk using a storage backend.
http://graphite.readthedocs.org/
Apache License 2.0
1.5k stars 490 forks source link

Increase and control threads during import [Q] #926

Closed Trolls closed 2 years ago

Trolls commented 2 years ago

Hello,

in a standalone configuration, I tried to increase and control threads when i create new metric database wsp files.

With this carbon configuration file, the process of creation is too slow. I got a lot of values to import.

[cache]
STORAGE_DIR    = /var/lib/carbon/
LOCAL_DATA_DIR = /var/lib/carbon/whisper/
WHITELISTS_DIR = /var/lib/carbon/lists/
CONF_DIR       = /etc/carbon/
LOG_DIR        = /var/log/carbon/
PID_DIR        = /var/run/
ENABLE_LOGROTATION = True
USER = carbon
#MAX_CACHE_SIZE = inf
MAX_CACHE_SIZE = 1000000
MAX_UPDATES_PER_SECOND = inf
MAX_CREATES_PER_MINUTE = inf
LINE_RECEIVER_INTERFACE = 0.0.0.0
LINE_RECEIVER_PORT = 2003
ENABLE_UDP_LISTENER = False
UDP_RECEIVER_INTERFACE = 0.0.0.0
UDP_RECEIVER_PORT = 2003
PICKLE_RECEIVER_INTERFACE = 0.0.0.0
PICKLE_RECEIVER_PORT = 2004
LOG_LISTENER_CONNECTIONS = True
USE_INSECURE_UNPICKLER = False
CACHE_QUERY_INTERFACE = 0.0.0.0
CACHE_QUERY_PORT = 7002
USE_FLOW_CONTROL = True
LOG_UPDATES = False
LOG_CACHE_HITS = False
LOG_CACHE_QUEUE_SORTS = True
CACHE_WRITE_STRATEGY = sorted
WHISPER_AUTOFLUSH = False
WHISPER_FALLOCATE_CREATE = True
[relay]
LINE_RECEIVER_INTERFACE = 0.0.0.0
LINE_RECEIVER_PORT = 2013
PICKLE_RECEIVER_INTERFACE = 0.0.0.0
PICKLE_RECEIVER_PORT = 2014
LOG_LISTENER_CONNECTIONS = True
RELAY_METHOD = rules
REPLICATION_FACTOR = 1
DESTINATIONS = 127.0.0.1:2004
MAX_DATAPOINTS_PER_MESSAGE = 500
MAX_QUEUE_SIZE = 10000
QUEUE_LOW_WATERMARK_PCT = 0.8
USE_FLOW_CONTROL = True
[aggregator]
LINE_RECEIVER_INTERFACE = 0.0.0.0
LINE_RECEIVER_PORT = 2023
PICKLE_RECEIVER_INTERFACE = 0.0.0.0
PICKLE_RECEIVER_PORT = 2024
LOG_LISTENER_CONNECTIONS = True
FORWARD_ALL = False
DESTINATIONS = 127.0.0.1:2004
REPLICATION_FACTOR = 1
MAX_QUEUE_SIZE = 10000
USE_FLOW_CONTROL = True
MAX_DATAPOINTS_PER_MESSAGE = 500
MAX_AGGREGATION_INTERVALS = 5

In my bash script, during the import, i use this command:

echo myroot.perf.type.serial.object.object_id.metric "Value" "timestamp" | nc localhost 2003

To improve writes, i launch commands in background with "&". so i tried to control these thread, but sometime i got these error:

Ncat: Connection timed out.

Ncat: Cannot assign requested address.

Somebody has a better way to import a lot of values or information about these messages?

Many thanks for any help

Br,

deniszh commented 2 years ago

You already have MAX_CREATES_PER_MINUTE = Inf, so, you hitting limit of single carbon instance, you need, indeed, spawn more carbon processes. Each process should have separate port, e.g. 2003, 2005, 2007 etc. For that you need to create more [carbon] sections, e.g. [carbon:a], [carbon:b], etc - then you can use carbon-cache --instance=a to spawn instance a, carbon-cache --instance=b- to spawn instance b etc. Then you need to feed each instance separate feed of metrics. In your case probably you can do that manually, but it's also possible to use raly for that. Put all your carbon instances in DESTINATIONS (e.g. DESTINATIONS = 127.0.0.1:2003 127.0.0.1:2005 127.0.0.1:2007 ) parameter in [relay] section and feed all data in relay port (2013).

deniszh commented 2 years ago

PS: or you can use e.g. go-carbon to scale this task across multiple CPU automatically. It's compatible with carbon but it's config format is different.

Trolls commented 2 years ago

Thank you deniszh.

i just try this configuration file, it should be faster

is it correct?

[cache]
STORAGE_DIR    = /var/lib/carbon/
LOCAL_DATA_DIR = /var/lib/carbon/whisper/
WHITELISTS_DIR = /var/lib/carbon/lists/
CONF_DIR       = /etc/carbon/
LOG_DIR        = /var/log/carbon/
PID_DIR        = /var/run/
ENABLE_LOGROTATION = True
USER = carbon
MAX_CACHE_SIZE = inf
MAX_UPDATES_PER_SECOND = inf
MAX_CREATES_PER_MINUTE = inf
LINE_RECEIVER_INTERFACE = 0.0.0.0
LINE_RECEIVER_PORT = 2003
ENABLE_UDP_LISTENER = False
UDP_RECEIVER_INTERFACE = 0.0.0.0
UDP_RECEIVER_PORT = 2003
PICKLE_RECEIVER_INTERFACE = 0.0.0.0
PICKLE_RECEIVER_PORT = 2004
LOG_LISTENER_CONNECTIONS = True
USE_INSECURE_UNPICKLER = False
CACHE_QUERY_INTERFACE = 0.0.0.0
CACHE_QUERY_PORT = 7002
USE_FLOW_CONTROL = True
LOG_UPDATES = False
LOG_CACHE_HITS = False
LOG_CACHE_QUEUE_SORTS = True
CACHE_WRITE_STRATEGY = sorted
WHISPER_AUTOFLUSH = False
WHISPER_FALLOCATE_CREATE = True

[cache:b]
LINE_RECEIVER_PORT = 2103
PICKLE_RECEIVER_PORT = 2104
CACHE_QUERY_PORT = 7102
[cache:c]
LINE_RECEIVER_PORT = 2113
PICKLE_RECEIVER_PORT = 2115
CACHE_QUERY_PORT = 7112
[cache:d]
LINE_RECEIVER_PORT = 2123
PICKLE_RECEIVER_PORT = 2125
CACHE_QUERY_PORT = 7122
[cache:e]
LINE_RECEIVER_PORT = 2133
PICKLE_RECEIVER_PORT = 2135
CACHE_QUERY_PORT = 7132

[relay]
LINE_RECEIVER_INTERFACE = 0.0.0.0
LINE_RECEIVER_PORT = 2013
PICKLE_RECEIVER_INTERFACE = 0.0.0.0
PICKLE_RECEIVER_PORT = 2014
LOG_LISTENER_CONNECTIONS = True
RELAY_METHOD = rules
REPLICATION_FACTOR = 1
DESTINATIONS = 127.0.0.1:2004, 127.0.0.1:2104:b ,127.0.0.1:2115:c, 127.0.0.1:2125:d, 127.0.0.1:2135:e
MAX_DATAPOINTS_PER_MESSAGE = 500
MAX_QUEUE_SIZE = 10000
QUEUE_LOW_WATERMARK_PCT = 0.8
USE_FLOW_CONTROL = True

[aggregator]
LINE_RECEIVER_INTERFACE = 0.0.0.0
LINE_RECEIVER_PORT = 2023
PICKLE_RECEIVER_INTERFACE = 0.0.0.0
PICKLE_RECEIVER_PORT = 2024
LOG_LISTENER_CONNECTIONS = True
FORWARD_ALL = False
DESTINATIONS = 127.0.0.1:2004
REPLICATION_FACTOR = 1
MAX_QUEUE_SIZE = 10000
USE_FLOW_CONTROL = True
MAX_DATAPOINTS_PER_MESSAGE = 500
MAX_AGGREGATION_INTERVALS = 5

I guessall carbon-cache were started and running?

[root@vm-grafana carbon]# ps -ef | grep carbon
carbon    3397     1 61 00:26 ?        00:09:19 /usr/bin/python2 -s /usr/bin/carbon-cache --config=/etc/carbon/carbon.conf --pidfile=/var/run/carbon-cache.pid --logdir=/var/log/carbon/ start
carbon   12785     1  0 00:40 ?        00:00:00 /usr/bin/python2 -s /usr/bin/carbon-cache --config=/etc/carbon/carbon.conf --pidfile=/var/run/carbon-cache-b.pid --logdir=/var/log/carbon/ --instance=b start
root     13838  2227  0 00:41 pts/5    00:00:00 grep --color=auto carbon
carbon   15836     1  0 00:40 ?        00:00:00 /usr/bin/python2 -s /usr/bin/carbon-cache --config=/etc/carbon/carbon.conf --pidfile=/var/run/carbon-cache-c.pid --logdir=/var/log/carbon/ --instance=c start
carbon   17335     1  0 00:40 ?        00:00:00 /usr/bin/python2 -s /usr/bin/carbon-cache --config=/etc/carbon/carbon.conf --pidfile=/var/run/carbon-cache-d.pid --logdir=/var/log/carbon/ --instance=d start
carbon   18563     1  0 00:40 ?        00:00:00 /usr/bin/python2 -s /usr/bin/carbon-cache --config=/etc/carbon/carbon.conf --pidfile=/var/run/carbon-cache-e.pid --logdir=/var/log/carbon/ --instance=e start

I will try go-carbon asap.

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.