Closed kootenpv closed 4 years ago
Provide full repro code and minimal sample data.
Please search on github for csv files larger than 50mb, plenty of examples (on my phone now)
No I meant to provide full repro code, and a minimal tiny sample data, I dont want the repo to become slow harder to clone because of sample data, but I am more interested on the code that you are using to debug the bug.
Also maybe now other libs have improved, when this lib started, it was a lot faster than others, also required no dependencies, while other had tons of dependencies, I think thats important nowadays that everyone uses Docker and Alpine, if other libs improved good for them.
Ah my bad. I just compared:
csv2list
(3s) against pd.read_csv
(1s) for a 80MB file of data (cannot share). csv2dict
crashed on it.
Try the new version should be a little better:
sudo pip3 install faster-than-csv==edca7e6 --no-binary :all:
You can pass the columns
count argument to get a little bit better performance, or leave it as 0
otherwise.
csv2list()
should work now.
It would be a good idea to include a benchmark with larger, more heterogenous data.