ui.aggrid.from_pandas unable to render pivot data

Wzhipeng commented 9 months ago

Description

when my dataframe is pivot data, ui.aggrid.from_pandas unable to render

falkoschindler commented 9 months ago

Hi @Wzhipeng, I just tried an example from https://pandas.pydata.org/docs/reference/api/pandas.pivot_table.html and it seems to work correctly:

df = pd.DataFrame({"A": ["foo", "foo", "foo", "foo", "foo",
                         "bar", "bar", "bar", "bar"],
                   "B": ["one", "one", "one", "two", "two",
                         "one", "one", "two", "two"],
                   "C": ["small", "large", "large", "small",
                         "small", "large", "small", "small",
                         "large"],
                   "D": [1, 2, 2, 3, 3, 4, 5, 6, 7],
                   "E": [2, 4, 5, 5, 6, 6, 8, 9, 9]})

table = pd.pivot_table(df, values='D', index=['A', 'B'],
                       columns=['C'], aggfunc="sum")

ui.aggrid.from_pandas(table)

Can you, please, share a minimum reproducible example of the problem? Thanks!

Wzhipeng commented 9 months ago

df = pd.DataFrame({
        "D": [1, 2, 2, 3, 3, 4, 5, 6, 7],
        "E": [2, 4, 5, 5, 6, 6, 8, 9, 9]})
cols = ['E']
table = df.groupby('D')[cols].agg(['count', 'sum', 'mean', 'min', 'max', 'std'])

falkoschindler commented 9 months ago

Thanks, @Wzhipeng!

I needed to consult ChatGPT to understand what is happening with these pivot tables:

The error you're encountering is due to the way pandas creates multi-indexed columns after aggregation. When you aggregate the data in your DataFrame with agg(['count']), pandas creates a multi-level column index. When you then convert this DataFrame to a dictionary with to_dict('records') [that's what happens inside from_pandas], the keys of this dictionary are tuples representing the multi-level index, in this case, e.g. ('E', 'count').

The json.dumps() function [this is called when sending the grid to the client] can't serialize these tuples directly because JSON keys must be strings, not tuples. This is the cause of the TypeError you're seeing.

To fix this, you need to flatten the multi-level column index into a single level before converting the DataFrame to a dictionary. Here's how you can modify your code to achieve this:
table.columns = ['_'.join(col) for col in table.columns.values]

With this line, the AG Grid renders successfully:

ui.aggrid.from_pandas(table)

I'm not sure what we can do to avoid the exception in the first place. It feels like a rather special use case and it's not inherently clear what to do with such multi-level indices. Maybe we can detect them and raise a more meaningful exception.

falkoschindler commented 9 months ago

We could add this line to from_pandas:

        if isinstance(df.columns, pd.MultiIndex):
            raise ValueError('MultiIndex columns are not supported. '
                             'You can convert them to strings using something like '
                             '`df.columns = ["_".join(col) for col in df.columns.values]`.')

The same change should be made to ui.table, since it probably has the same limitation.

Wzhipeng commented 9 months ago

def dataframe_names(df):
    if isinstance(df.index.names, str):
        return [df.index.name]
    else:
        return df.index.names

def dataframe_col_names(col):
    if isinstance(col, (int, float, str)):
        return [col]
    else:
        return col

def pivot_dataframe_column_setting(df: DataFrame):
    renameColumns = {}

    def build_tree(tuples_list):
        result = []
        for tpl in tuples_list:
            col_names = dataframe_col_names(tpl)
            current_level = result
            for elem in col_names:
                renameColumns[str(col_names)] = "-".join(map(str, col_names))
                match = next((item for item in current_level if item['headerName'] == elem), None)

                if match:
                    current_level = match.setdefault('children', [])
                else:
                    new_node = {'headerName': str(elem), 'headerClass': 'bg-sky-200'}
                    current_level.append(new_node)
                    current_level = new_node.setdefault('children', [])

        def set_header_name(node, path=None):
            if path is None:
                path = []
            if 'children' in node:
                for child in node['children']:
                    set_header_name(child, path + [node['headerName']])
                node['field'] = '-'.join(path + [node['headerName']])
            else:
                node['field'] = node['headerName']

        for item in result:
            set_header_name(item)

        return result

    columnDefs = build_tree(df.columns)

    def remove_children(node):
        if 'children' in node and not node['children']:
            del node['children']
        else:
            for child in node.get('children', []):
                remove_children(child)

    for item in columnDefs:
        remove_children(item)

    return columnDefs, renameColumns

column_defs, renameColumns = pivot_dataframe_column_setting(df)
df_adjusted = DataFrame(df.to_records(index=index_not_none)).rename(columns=renameColumns)
grid = ui.aggrid({
    'columnDefs': column_defs,
    'rowData': df_adjusted.to_dict('records'),
    'suppressDotNotation': True,
}, theme="alpine", auto_size_columns=False).style("--ag-borders: none")
grid.options['defaultColDef'] = {
    # 'flex': 1,
    'sortable': True,
    'resizable': True,
    'filter': True,
    'editable': True,
    'headerClass': 'bg-sky-200',
}
grid.options['pagination'] = True
grid.options['paginationPageSize'] = page_size

@falkoschindler I finally wrote it myself to generate aggrid's columnDefs and rowData through dataframe. Because in fact, the header of the table displayed at the end is nested, otherwise filtering cannot be performed.

Wzhipeng commented 9 months ago

But I think this method may have bugs in some scenarios, but I tried a lot of pivot or group by, and it worked fine.

zauberzeug / nicegui

ui.aggrid.from_pandas unable to render pivot data #2384

Description