quantopian / qgrid

An interactive grid for sorting, filtering, and editing DataFrames in Jupyter notebooks
Apache License 2.0
3.04k stars 425 forks source link

qgrid does not play well with 'categorical' dtype and large dataset #321

Open robertour opened 4 years ago

robertour commented 4 years ago

I am running qgrid 1.3.1 and pandas pandas==1.0.3. I have a DataFrame of 300K rows, and it runs well.

However, when I convert some of the columns to categorical dtype, the display and filters are very slow.

Here is a dummy example to replicate the problem:

Without categorical dtype (it should run very quickly):

import pandas as pd
import numpy as np
import qgrid

df = pd.DataFrame({'cat1':np.random.randint(low=1, high=1000000, size=400000), 
                  'cat2': np.random.randint(low=1, high=1000000, size=400000), 
                  'cat3': np.random.randint(low=1, high=1000000, size=400000), 
                  'cat4': np.random.randint(low=1, high=1000000, size=400000),
                  })

qgrid.show_grid(df)

When I convert the columns to categorical (silly in this example):

df2 = df.copy()
df2['cat1'] = df2['cat1'].astype('category')
df2['cat2'] = df2['cat2'].astype('category')
df2['cat3'] = df2['cat3'].astype('category')
df2['cat4'] = df2['cat4'].astype('category')
qgrid.show_grid(df2)

Then, the display is super slow. Moreover, clicking in the filters (on top of the columns) is even slower.