I noticed that a significant amount of time is spent in read syscalls inside the CSV reader.
This buffers read from the input file to reduce the number of syscalls.
I've set the size of the buffer empirically, starting from 4kB until I've stopped seeing improvements.
cat data.csv | head -n 1000000 | ./mmdbctl import --csv --ip 6 --alias-6to4 --no-network --disallow-reserved > /dev/null
# Before: 25% of time spent in csv.Read
# After: 10% of time spent in csv.Read
I noticed that a significant amount of time is spent in
read
syscalls inside the CSV reader.This buffers read from the input file to reduce the number of syscalls. I've set the size of the buffer empirically, starting from 4kB until I've stopped seeing improvements.
CPU profiles: original buffered