Closed Wzhipeng closed 9 months ago
Hi @Wzhipeng, I just tried an example from https://pandas.pydata.org/docs/reference/api/pandas.pivot_table.html and it seems to work correctly:
df = pd.DataFrame({"A": ["foo", "foo", "foo", "foo", "foo",
"bar", "bar", "bar", "bar"],
"B": ["one", "one", "one", "two", "two",
"one", "one", "two", "two"],
"C": ["small", "large", "large", "small",
"small", "large", "small", "small",
"large"],
"D": [1, 2, 2, 3, 3, 4, 5, 6, 7],
"E": [2, 4, 5, 5, 6, 6, 8, 9, 9]})
table = pd.pivot_table(df, values='D', index=['A', 'B'],
columns=['C'], aggfunc="sum")
ui.aggrid.from_pandas(table)
Can you, please, share a minimum reproducible example of the problem? Thanks!
df = pd.DataFrame({
"D": [1, 2, 2, 3, 3, 4, 5, 6, 7],
"E": [2, 4, 5, 5, 6, 6, 8, 9, 9]})
cols = ['E']
table = df.groupby('D')[cols].agg(['count', 'sum', 'mean', 'min', 'max', 'std'])
Thanks, @Wzhipeng!
I needed to consult ChatGPT to understand what is happening with these pivot tables:
The error you're encountering is due to the way pandas creates multi-indexed columns after aggregation. When you aggregate the data in your DataFrame with
agg(['count'])
, pandas creates a multi-level column index. When you then convert this DataFrame to a dictionary withto_dict('records')
[that's what happens insidefrom_pandas
], the keys of this dictionary are tuples representing the multi-level index, in this case, e.g.('E', 'count')
.The
json.dumps()
function [this is called when sending the grid to the client] can't serialize these tuples directly because JSON keys must be strings, not tuples. This is the cause of the TypeError you're seeing.To fix this, you need to flatten the multi-level column index into a single level before converting the DataFrame to a dictionary. Here's how you can modify your code to achieve this:
table.columns = ['_'.join(col) for col in table.columns.values]
With this line, the AG Grid renders successfully:
ui.aggrid.from_pandas(table)
I'm not sure what we can do to avoid the exception in the first place. It feels like a rather special use case and it's not inherently clear what to do with such multi-level indices. Maybe we can detect them and raise a more meaningful exception.
We could add this line to from_pandas
:
if isinstance(df.columns, pd.MultiIndex):
raise ValueError('MultiIndex columns are not supported. '
'You can convert them to strings using something like '
'`df.columns = ["_".join(col) for col in df.columns.values]`.')
The same change should be made to ui.table
, since it probably has the same limitation.
def dataframe_names(df):
if isinstance(df.index.names, str):
return [df.index.name]
else:
return df.index.names
def dataframe_col_names(col):
if isinstance(col, (int, float, str)):
return [col]
else:
return col
def pivot_dataframe_column_setting(df: DataFrame):
renameColumns = {}
def build_tree(tuples_list):
result = []
for tpl in tuples_list:
col_names = dataframe_col_names(tpl)
current_level = result
for elem in col_names:
renameColumns[str(col_names)] = "-".join(map(str, col_names))
match = next((item for item in current_level if item['headerName'] == elem), None)
if match:
current_level = match.setdefault('children', [])
else:
new_node = {'headerName': str(elem), 'headerClass': 'bg-sky-200'}
current_level.append(new_node)
current_level = new_node.setdefault('children', [])
def set_header_name(node, path=None):
if path is None:
path = []
if 'children' in node:
for child in node['children']:
set_header_name(child, path + [node['headerName']])
node['field'] = '-'.join(path + [node['headerName']])
else:
node['field'] = node['headerName']
for item in result:
set_header_name(item)
return result
columnDefs = build_tree(df.columns)
def remove_children(node):
if 'children' in node and not node['children']:
del node['children']
else:
for child in node.get('children', []):
remove_children(child)
for item in columnDefs:
remove_children(item)
return columnDefs, renameColumns
column_defs, renameColumns = pivot_dataframe_column_setting(df)
df_adjusted = DataFrame(df.to_records(index=index_not_none)).rename(columns=renameColumns)
grid = ui.aggrid({
'columnDefs': column_defs,
'rowData': df_adjusted.to_dict('records'),
'suppressDotNotation': True,
}, theme="alpine", auto_size_columns=False).style("--ag-borders: none")
grid.options['defaultColDef'] = {
# 'flex': 1,
'sortable': True,
'resizable': True,
'filter': True,
'editable': True,
'headerClass': 'bg-sky-200',
}
grid.options['pagination'] = True
grid.options['paginationPageSize'] = page_size
@falkoschindler I finally wrote it myself to generate aggrid's columnDefs and rowData through dataframe. Because in fact, the header of the table displayed at the end is nested, otherwise filtering cannot be performed.
But I think this method may have bugs in some scenarios, but I tried a lot of pivot or group by, and it worked fine.
Description
when my dataframe is pivot data, ui.aggrid.from_pandas unable to render