Open mccalluc opened 4 years ago
I would be surprised if numeric dtypes were huge (but good to check!). However, in my experience people forget that casting a column in pandas as categorical
for many repeated entries (ie. cell type, etc) can lead to a much lower memory footprint. For saving arrow, I found converting categorical columns had some nice benefits in the resulting arrow size. I found it easiest to convert these types on the pandas.DataFrame
and then let pyarrow
take care of mapping these to arrow-specific dtypes.
Trevor notes: