Closed ecsumed closed 5 years ago
arent you maxing out the relay (CPU)? If so checkout https://github.com/grobian/carbon-c-relay which is much faster than the Python implementation and does multiprocessing. If you care about performance and you are building a new set-up., you might as well start out with https://github.com/lomik/go-carbon
Woah! @piotr1212 spot on. So the relay was causing the crashes and not the destination. Would a bigger server help here? In the meantime I'll checkout out the c-relay. I am testing on a new setup but only to find out how many disks I'll need for my production setup which I need to shard. A single disk is no longer feasible and is maxing out IO. My goal is to have a 0 carbon queue with support for 500k/10minutes metrics.
This project (the original Graphite) is all written in Python. Python does not do multiprocessing (in a single process at least). This practically means that one process can only use one core at a time. With larger server you most likely mean one with more cores. This would not make any difference as the process cannot make use of those extra cores. In that case you would need to add a loadbalancer which balances over multiple relays.
Some parts of Graphite are rewritten in programming languages which do no have the limitation which Python has wrt multiprocessing. Examples are carbon-c-relay carbon-relay-ng and go-carbon. If you are building a new system I would go for those instead of the original Python implementation. The only original part to use would be graphite-web, as it still has no full featured replacement (that I am aware off).
Hey @piotr1212 . Thanks for suggesting relay and carbon variants. I ended up using c-relay with go-carbon and so far, it's great. Thanks
I'm load testing a new carbon setup running 1 relay to 3 carbons (all on different hosts). The load test runs 720k metrics every 150 seconds. Here are the graphs: https://imgur.com/Wn41iG2
Notice the discrepancies in relay metrics received and and relay metrics sent. And the one time that the relay did send the full amount of metrics, the carbons only received a fraction of them.
Also notice the files created. Eventually there should be a total of 720k files (90k hosts x 8 metrics). But they flat out. After about 40k on each of the 3 carbon hosts, files were created rarely far in between.
Here's my relay config:
And my 3 carbons config (all on different hosts):
Carbon cache is set to
inf
as I do not want any points to doep so not sure what's happening. The only anomaly I found was the relay complaining about the destinations (carbons) down. The carbons are running fine though.Version: 1.2.0
What am I missing?