The existing notebook reads string columns from csv files using dtype="category" which breaks with latest 0.14 cudf nightly. This PR fixes the error by reading the columns as their original dtype str and hash those values to an equivalent int column.
The existing notebook reads string columns from csv files using
dtype="category"
which breaks with latest 0.14 cudf nightly. This PR fixes the error by reading the columns as their original dtypestr
and hash those values to an equivalentint
column.