Closed kinverarity1 closed 6 years ago
Nope, the suggestion above is asking for trouble to start with for files > memory size.
I have instead started with an iterator for chunk in gdf.chunks(50000):
which will yield a pandas.DataFrame of no more than e.g. 50000 rows. You can go as high as your computer's memory will allow.
Or if you want to look at each row of data in turn, you can abstract the chunking by using for row in gdf.iterrows(50000)
-- it will still chunk to save you running out of memory, but yield each row at a time. The higher you get that number, the faster the I/O will be.
I will close this issue when I document the API somewhere, probably in the README for now.
Documented
My feeling is we should have a united GDF2 object with e.g.: