batch size = 300
loaded 52358400 items in 347.260693sec with 10 workers (mean point rate 150775.486626/sec, mean value rate 150775.486626/s, 15.52MB/sec from stdin)
use insert
loaded 52358400 items in 78.183677sec with 10 workers (mean point rate 669684.545983/sec, mean value rate 669684.545983/s, 68.92MB/sec from stdin)
test with HDD (IOPS 350, 15M/s sequence IO) 2xlarge VM.
table schema: X(tsuid TEXT, time bigint, value double, primary key(tsuid, time))
But influxdb still has a huge advantage on disk space usage.
Cassandra batch is not for improving performance. only for ensuring atomicity and isolation. use batch to load data is a common bad design.
see here and here
In my single value use case, the benchmark:
batch size = 300
loaded 52358400 items in 347.260693sec with 10 workers (mean point rate 150775.486626/sec, mean value rate 150775.486626/s, 15.52MB/sec from stdin)
use insert
loaded 52358400 items in 78.183677sec with 10 workers (mean point rate 669684.545983/sec, mean value rate 669684.545983/s, 68.92MB/sec from stdin)
test with HDD (IOPS 350, 15M/s sequence IO) 2xlarge VM.
table schema:
X(tsuid TEXT, time bigint, value double, primary key(tsuid, time))
But influxdb still has a huge advantage on disk space usage.