Closed AKheli closed 3 years ago
Hi @AbdelouahabKhelifati
Line protocol is fastest way to insert data to Graphite, probably, but implementation can be different, of course. Maybe increasing paralellism can help. Or maybe Graphite is not perfect tool for your task - maybe you need some advanced TSDB, like QuestDB, TimescaleDB or InfluxDB. My choice for any heavy analytics queries would be QuestDB nowadays.
Thanks deniszh.
Would you have a source for the claim that line protocol is the fastest way to load data into Graphite?
My only source is my experience, I only remember when people tried line and pickle protocol - and line was faster. And these 2 protocols are only two which exists. Maybe direct insert into whisper files is theorecally fastest, but it will require much more code and data transformations.
You'll probably have to use the whisper library directly https://github.com/graphite-project/whisper, if that is not fast enough there is also a golang implementation
Thank you deniszh and piotr1212!
I am trying to load 100 billion multi-dimensional time series datapoints into Graphite from a CSV file with the following format:
I tried to find a fast loading method on the official documentation and here's how I am currently doing the insertion (my codebase is in Python):
As the code above shows, my code is reading the dataset CSV file and preparing batches of 5000 data points, then sending the datapoints using
sock.sendall
.However, this method is not very efficient. In fact, I am trying to load 100 billion data points and this is taking way longer than expected, loading only 5 Million rows with 1500 columns each has been running for 40 hours and still has 15 hours to finish:
Is there a better way to load the dataset into Graphite?