parsecsv / parsecsv-for-php

CSV data parser for PHP.
MIT License
681 stars 176 forks source link

TOO SLOW!!! #212

Closed MakkiAbid closed 2 months ago

MakkiAbid commented 2 years ago

Speed is too slow when I make to import records in millions ... does the library uses LOAD DATA INFILE for parsing in queries or not ???

Kr3m commented 2 years ago

Why would there be a SQL operation on a basic PHP CSV parser? I haven't checked, but I wouldn't assume it uses it either. You could easily answer your own question by searching the code.

UPDATE: It does not, but loading millions of records on the fly isn't realistic either. You should separate them out into separate files. Refer to this article: https://www.theworkshop.com/en/blog/how-to-load-millions-of-rows-into-mysql-without-agony/

jimeh commented 2 years ago

Loading a CSV file with millions of records is unlikely to be fast, and use lots of memory, as the parser loads the whole data set into memory from what I recall.

In theory streaming records or fields one at a time would be possible, but I believe it would entail a major departure from the current codebase.

stevleibelt commented 2 years ago

Hello @MakkiAbid,

try to not be so rude next time.

As @Kr3m and @jimeh mentioned already, neither "million of records" (I suspect you are thinking of rows without taking into account how much data one row does have), nor doing something in SQL like LOAD DATA INFILE is a realistic issue this component wants to solve.

I would suggest to close this issue @jimeh.

toutjavascript commented 8 months ago

Hello I was attracted by this CSV parser. But it is too slow to manage 1 million row (weather datas)

stevleibelt commented 8 months ago

@toutjavascript 1 Million row's is nothing labeled as "simple task". Could you be more accurate what "too slow" means? One what kind of machine did it take how many seconds? Was the I/O really the issue? What have you done with your data?

toutjavascript commented 8 months ago

Hello It takes about 5 minutes on a mac M1 Max with 1 million lines CSV with about 30 columns I think offset is not well managed. Reading the first blocks is faster than the last blocks

toutjavascript commented 8 months ago

If you want to test on a real file that i must manage (one over hundreds) https://www.data.gouv.fr/fr/datasets/r/c79aaafe-8017-4d2b-8884-57b5391da5bc

gogowitsch commented 2 months ago

Maintainer here: Pull requests to improve this library are welcome. If speed is improved and all tests continue to run: cool! The aim is to continue supporting old PHP versions, though.

I close this issue now. Feel free to suggest faster libraries for future readers.