scylladb / scylla-bench

43 stars 36 forks source link

Inserting data way too slow. #68

Open pveentjer opened 3 years ago

pveentjer commented 3 years ago

I'm using the following command to insert data:

go/bin/scylla-bench  -workload sequential -mode write -partition-count 10000 -nodes 172.31.24.133  2>&1 | tee -a scylla-bench-12-08-2021_15-03-19.log

I can see in Scylla Monitor and on the commandline output that 22K operations/second are done.

  39s   21738   21738      0 1.4ms  1ms    885µs  819µs  786µs  754µs  735µs  
  40s   21803   21803      0 1.5ms  1.1ms  852µs  819µs  786µs  754µs  733µs  
  41s   21779   21779      0 1.5ms  1ms    885µs  819µs  786µs  754µs  734µs  
  42s   21714   21714      0 1.5ms  1ms    885µs  819µs  786µs  754µs  736µs  
  43s   21682   21682      0 1.4ms  1ms    885µs  819µs  786µs  754µs  737µs  
  44s   21713   21713      0 1.6ms  1.3ms  918µs  819µs  786µs  754µs  736µs  
  45s   21766   21766      0 1.5ms  1.1ms  885µs  819µs  786µs  754µs  734µs  
  46s   21765   21765      0 1.4ms  1.1ms  885µs  819µs  786µs  754µs  734µs  

So this command should complete in half a second.

But in reality it runs for 46 seconds.

I also calculated the throughput of the inserts manually (so partition-count/time) and I get 210 inserts/second. When I increase to 20K or 30K items, the manually calculated throughput remains constant at 210 inserts/second.

So it seems there is roughly a factor of 100 difference between the manually calculated insertion rate and the writes/second listed by Scylla Monitor and scylla-bench.

pveentjer commented 3 years ago

When I add clustering-row-count 1, the problem is resolved.

go/bin/scylla-bench -workload sequential -clustering-row-count 1 -mode write -partition-count 40000 -partition-offset 0 -nodes 172.31.24.133 2>&1 | tee -a scylla-bench-12-08-2021_15-39-25.log

Insertion of 20K is finished super quickly.

michoecho commented 3 years ago

So it seems there is roughly a factor of 100 difference between the manually calculated insertion rate and the writes/second listed by Scylla Monitor and scylla-bench. When I add clustering-row-count 1, the problem is resolved.

That's because the default is -clustering-row-count 100, so 100 rows are inserted for each partition. You get 210 partition inserts per second, which equals to 21000 row inserts per second.

pveentjer commented 3 years ago

Why not set the default to 1? This makes more sense.