Dask seems to do a cross product of categorical columns during groupby, but unlike pandas, it does not let user to disable that. Due to this it is running into out of memory error already using smaller 0.5 GB csv data. Reported upstream: https://github.com/dask/dask/issues/7024
Dask seems to do a cross product of categorical columns during groupby, but unlike pandas, it does not let user to disable that. Due to this it is running into out of memory error already using smaller 0.5 GB csv data. Reported upstream: https://github.com/dask/dask/issues/7024