CSV reading isn't fully iterative

dracos commented 7 years ago

If you call get_data with a file-type object and a CSV file type, if I've understood the code correctly, it nearly can be read in iteratively without loading the entire file into memory, apart from the fact that _load_from_stream does a full read() at https://github.com/pyexcel/pyexcel-io/blob/1cffd9d2edbe8decc30968281934fcfd6a3ad774/pyexcel_io/fileformat/_csv.py#L269 in order to look for separators. If that could be made optional (if you know you don't have separators for example), then the process would be fully iterative and only read from the file as you looped through it, which would be useful on extremely large files.

chfw commented 7 years ago

I will look at how to iterate content when it is in memory.

chfw commented 7 years ago

please evaluate the fixes and let me know how it goes.

dracos commented 7 years ago

Thank you for this, making the multiple streams optional is great :)

pyexcel / pyexcel-io

CSV reading isn't fully iterative #33