Standardise API - Githubissues

kinverarity1 commented 6 years ago

My feeling is we should have a united GDF2 object with e.g.:

>>> gdf = aseg_gdf2.read('blah')
>>> gdf.record_types 
{'': {'fields': [...], 'format': ...},
 'COMM': {'fields': [...], 'format': ...}
>>> gdf.data
{'': ...,
 'COMM': ...,
 'PROJ': ...}
>>> gdf.data['']   # the empty string would represent RT=  (null type)
{'Col1': [...],
 '__array__': [[...], [....], ....]}

kinverarity1 commented 6 years ago

Nope, the suggestion above is asking for trouble to start with for files > memory size.

I have instead started with an iterator for chunk in gdf.chunks(50000): which will yield a pandas.DataFrame of no more than e.g. 50000 rows. You can go as high as your computer's memory will allow.

Or if you want to look at each row of data in turn, you can abstract the chunking by using for row in gdf.iterrows(50000) -- it will still chunk to save you running out of memory, but yield each row at a time. The higher you get that number, the faster the I/O will be.

I will close this issue when I document the API somewhere, probably in the README for now.

kinverarity1 commented 6 years ago

Documented

kinverarity1 / aseg_gdf2

Standardise API #1