pyexcel / pyexcel-io

One interface to read and write the data in various excel formats, import the data into and export the data from databases
http://io.pyexcel.org
Other
58 stars 20 forks source link

CSV reading isn't fully iterative #33

Closed dracos closed 7 years ago

dracos commented 7 years ago

If you call get_data with a file-type object and a CSV file type, if I've understood the code correctly, it nearly can be read in iteratively without loading the entire file into memory, apart from the fact that _load_from_stream does a full read() at https://github.com/pyexcel/pyexcel-io/blob/1cffd9d2edbe8decc30968281934fcfd6a3ad774/pyexcel_io/fileformat/_csv.py#L269 in order to look for separators. If that could be made optional (if you know you don't have separators for example), then the process would be fully iterative and only read from the file as you looped through it, which would be useful on extremely large files.

chfw commented 7 years ago

I will look at how to iterate content when it is in memory.

chfw commented 7 years ago

please evaluate the fixes and let me know how it goes.

dracos commented 7 years ago

Thank you for this, making the multiple streams optional is great :)