holoviz / panel

Panel: The powerful data exploration & web app framework for Python
https://panel.holoviz.org
BSD 3-Clause "New" or "Revised" License
4.69k stars 508 forks source link

Make it easy to use DataFrame with NestedSelect #6604

Open MarcSkovMadsen opened 6 months ago

MarcSkovMadsen commented 6 months ago

The new NestedSelect will be really useful. But 100% of my use cases starts with a Pandas Dataframe. And its currently not very clear how to use that with NestedSelect.

I would recommend either

Personally I would strongly recommend the second option. I would suggest adding class methods similar to get_options_from_dataframe and create_from_dataframe to the NestedSelect.

Example Code

import panel as pn
from bokeh.sampledata.autompg import autompg_clean
import pandas as pd

def _build_nested_dict(df, depth=0, max_depth=None):
    if max_depth is None:
        max_depth = len(df.columns)

    # Base case: if depth reaches the last column before values
    if depth == max_depth - 1:
        return df[df.columns[depth]].tolist()

    # Recursive case: build dictionary at current depth
    nested_dict = {}
    for value in df[df.columns[depth]].unique():
        filtered_df = df[df[df.columns[depth]] == value]
        nested_dict[value] = _build_nested_dict(filtered_df, depth + 1, max_depth)
    return nested_dict

def get_options_from_dataframe(df, cols=None):
    if not cols:
        cols = list(df.columns)

    df = df[cols].drop_duplicates().sort_values(cols).reset_index(drop=True)
    options = _build_nested_dict(df)
    return options

def test_get_options_from_dataframe():
    data = {
        'continent': ['Europe', 'Europe', 'Asia', 'Asia', 'North America'],
        'country': ['France', 'France', 'Japan', 'Japan', 'USA'],
        'manufacturer': ['Fiat', 'Peugeot', 'Toyota', 'Nissan', 'Ford'],
        'model': ['500', '208', 'Corolla', 'Sentra', 'Mustang']
    }
    df = pd.DataFrame(data)
    options = get_options_from_dataframe(df)
    print(options)

test_get_options_from_dataframe()

def create_from_dataframe(df, cols=None, **params):
    if not cols:
        cols = list(df.columns)

    options = get_options_from_dataframe(df, cols)
    params["levels"]=params.get("levels", cols)
    return pn.widgets.NestedSelect(options=options, **params)

cols = ["origin", "mfr", "name", ]
import panel as pn

pn.extension()

select=create_from_dataframe(autompg_clean, cols=cols, levels=["Origin", "Manufacturer", "Name"])
select.servable()

https://github.com/holoviz/panel/assets/42288570/f5559379-1e7e-40f4-8194-6aa366cf8bf2

Additional Question

Is there some relation to hvPlot/ HoloViews widgets? When you use groupby option in hvPlot it must do something similar?

[x] Yes. I would be willing to provide a PR if the proposal is accepted by Philipp.

ahuang11 commented 6 months ago

I think this code also works (easier to copy/paste this one if anyone is looking for this).

import panel as pn
import pandas as pd
from collections import defaultdict
pn.extension()

data = {
    "world": ["Earth", "Earth", "Earth", "Earth", "Earth", "Earth"],
    "continent": ["Europe", "Europe", "Asia", "Asia", "North America", "North America"],
    "country": ["France", "France", "Japan", "Japan", "USA", "USA"],
    "manufacturer": ["Fiat", "Peugeot", "Toyota", "Nissan", "Ford", "Ford"],
    "model": ["500", "208", "Corolla", "Sentra", "Mustang", "Mustang"],
}
df = pd.DataFrame(data)

cols = list(df.columns)
grouped = df.groupby(cols[:-1])
nested = grouped[cols[-1]].apply(lambda x: x.tolist()).to_dict()
create_nested_defaultdict = lambda depth: defaultdict(
    lambda: create_nested_defaultdict(depth - 1)
)
nested_data = create_nested_defaultdict(len(cols) - 1)
for keys, values in nested.items():
    if isinstance(keys, str):
        keys = (keys,)
    current_dict = nested_data
    for i, key in enumerate(keys):
        if i != len(keys) - 1:
            current_dict = current_dict[key]
        else:
            current_dict[key] = values
pn.widgets.NestedSelect(options=nested_data)

Other than that, I would say a class method would be preferable pn.widgets.NestedSelect.from_dataframe(df)

MarcSkovMadsen commented 5 months ago

One part of the answer to https://discourse.holoviz.org/t/overwhelmed-by-with-holoviews-hvplot-panel-workflow-permutations-concepts/7141 is to convert the MultiIndex of a DataFrame to a nested dict and use it with NestedSelect.

This is not trivial to do. I still hope I can convince the core devs that users need helper functions to convert DataFrame, MultiIndex etc. to nested dict.

The code is below.

import pandas as pd
from collections import OrderedDict
from pandas.core.indexes.multi import MultiIndex

def multiindex2dict(p: pd.MultiIndex|dict) -> dict:
    """
    Converts a pandas Multiindex to a nested dict
    :parm p: As this is a recursive function, initially p is a pd.MultiIndex, but after the first iteration it takes
    the internal_dict value, so it becomes to a dictionary
    """
    internal_dict = {}
    end = False
    for x in p:
        # Since multi-indexes have a descending hierarchical structure, it is convenient to start from the last
        # element of each tuple. That is, we start by generating the lower level to the upper one. See the example
        if isinstance(p, pd.MultiIndex):
            # This checks if the tuple x without the last element has len = 1. If so, the unique value of the
            # remaining tuple works as key in the new dict, otherwise the remaining tuple is used. Only for 2 levels
            # pd.MultiIndex
            if len(x[:-1]) == 1:
                t = x[:-1][0]
                end = True
            else:
                t = x[:-1]
            if t not in internal_dict:
                internal_dict[t] = [x[-1]]
            else:
                internal_dict[t].append(x[-1])
        elif isinstance(x, tuple):
            # This checks if the tuple x without the last element has len = 1. If so, the unique value of the
            # remaining tuple works as key in the new dict, otherwise the remaining tuple is used
            if len(x[:-1]) == 1:
                t = x[:-1][0]
                end = True
            else:
                t = x[:-1]
            if t not in internal_dict:
                internal_dict[t] = {x[-1]: p[x]}
            else:
                internal_dict[t][x[-1]] = p[x]

    # Uncomment this line to know how the dictionary is generated starting from the lowest level
    # print(internal_dict)
    if end:
        return internal_dict
    return multiindex2dict(internal_dict)