jpillora / csv-to-influxdb

Import CSV files into InfluxDB
80 stars 37 forks source link

When running parallel parsers, data gets lost #28

Open IlanZuckerman opened 6 years ago

IlanZuckerman commented 6 years ago

I took 3 csv and I executed them in two matters (obviously not at the same time):

I connected to DB and checked the amount of rows written in every run.

select count(*) from weekend2 name: weekend2

time count_Latency count_allThreads count_bytes count_elapsed count_grpThreads count_label count_success count_timeStamp 0 4535141 4535141 4535141 4535141 4535141 4535141 4535141 4535141

select count(*) from weekend3 name: weekend3

time count_Latency count_allThreads count_bytes count_elapsed count_grpThreads count_label count_success count_timeStamp 0 5022860 5022860 5022860 5022860 5022860 5022860 5022860 5022860

Conclusion: 9.7% of points were lost during parallel execution (!) Total amount of rows in CSVs: 5022863 Total amount of points in parallel run: 4535141 Total amount of points in sequential run: 5022860 (Exactly matches the amount of rows in CSVs minus 3 headers) 4535141 / 5022860 = 90% meaning that 10% were lost.