Cassandra: do not use batch load

Cassandra batch is not for improving performance. only for ensuring atomicity and isolation. use batch to load data is a common bad design.

see here and here

In my single value use case, the benchmark:

batch size = 300 loaded 52358400 items in 347.260693sec with 10 workers (mean point rate 150775.486626/sec, mean value rate 150775.486626/s, 15.52MB/sec from stdin)

use insert loaded 52358400 items in 78.183677sec with 10 workers (mean point rate 669684.545983/sec, mean value rate 669684.545983/s, 68.92MB/sec from stdin)

test with HDD (IOPS 350, 15M/s sequence IO) 2xlarge VM.

table schema: X(tsuid TEXT, time bigint, value double, primary key(tsuid, time))

But influxdb still has a huge advantage on disk space usage.

influxdata / influxdb-comparisons

Cassandra: do not use batch load #161