feat: support ibis tables (...and define a more formal protocol)

NickCrews commented 3 weeks ago

I would love to render (very very large, ie millions of rows) ibis tables in solara.DataTable. This currently doesn't work because

solara calls len(df) on the data. ibis uses .count() instead, to make the execution very explicit
the logic to go from ibis table to a list[dict] of records isn't implemented for ibis

I see a few ways to support this:

manually add ibis support :)
Make some more formal protocol how solara interacts with dataframes, and then let users implement the translation layer themselves.
Use ibis itself as your dataframe abstraction layer :) Another dependency, doesn't support vaex, probably not worth it.

I'm cooking up a little PR right now that at least refactors things to make them a bit cleaner, regardless of if we do something more drastic. Thank you!

maartenbreddels commented 3 weeks ago

Hi Nick,

we share a common dream then!

Our idea for the DataTable (or DataFrame) is to separate it into a very dumb view, that takes in the length and the records, etc. Our current DataTable would then use that dump view, so that others (like you) can easily build its own. However, that is independent of #600 and I like that PR. What do you think of this idea?

Note there is also https://github.com/data-apis/dataframe-api

Also related to this is: https://codepen.io/eddie1952/pen/jObPvKO which would make it scrollable for very large lengths, although it would be limited to a fixed row height.

Regards,

Maarten

NickCrews commented 3 weeks ago

If I understand correctly, you are suggesting making the DataTable component take arguments records: list[dict], n: int instead of the current df: DataFrameLike? I am a small -1 on this idea:

if we show 0 rows, then we still need to know the schema of the table. So that would have to get passed too
In the format callback, we pass in the original dataframe. People can do a lot more useful processing on a real DataFrame than a list[dict]
If people have to do the to_records() etc themselves, they are going to make mistakes
to do lazy row slicing, we would need to also accept a callback of the form (start, stop) -> list[dict]
it's not THAT hard for us to be helpful and accept the types that 95% of users will be working with natively.

But I also notice that solara doesn't depend on pandas by itself. So if someone is just making web requests and getting JSON back, it would be nice if they could plot that in a DataTable as a list[dict] without needing to install pandas for a useless round trip.

If we made a lightweight class SolaraDF(Protocol) that implemented the ~5 needed methods of https://github.com/widgetti/solara/pull/600 (eg .count(), columns(), to_records(), etc), then we could use that internally, have a few conversion methods that took pandas/vaex/ibis formats on wrapped them in that. I think I like this path the most. It would also be easier to transition to the dataframe API later, since users already are passing in dataframes

Note there is also https://github.com/data-apis/dataframe-api

I'm a little familiar, but not really. Does this support lazy slicing? ie could you get rows 1,000,000 to 1,000,100 without materializing? Does this require people to have eg pandas installed for the final output step, or can you materialize to vanilla python list[dict]s? Does this support all the other methods we need? If it does everything we need, then seems like the logical way to implement it.

maartenbreddels commented 3 weeks ago

No, I suggested splitting off the 'view' (say, call it DataFrameView) part of the dataframe into a separate component, such that the higher level DataFrame component uses the DataFrameView component. This makes it easier to support other datasource, by building on top of DataFrameView.

The same ideas we have for our FileBrowser. This currently has a lot of filesystem specific parts. If we split this off in a FileBrowserView, with the filesystem specific part in FileBrowser component, someone could make an S3FileBrowser component on top of FileBrowserView.

Slicing is supported: https://data-apis.org/dataframe-api/draft/API_specification/dataframe_object.html#dataframe_api.DataFrame.slice_rows

So this seems like a promising way forward.

NickCrews commented 2 weeks ago

Hmm, I'm not sure I totally understand. Maybe if you give some code stubs with the proposed API then I could make a PR based on that?

maartenbreddels commented 1 week ago


@solara.component
def DataTableView(
    records,
    column_names,
    page=0,
    on_page=None,
    items_per_page=20,
    format=None,
    column_actions: List[ColumnAction] = [],
    cell_actions: List[CellAction] = [],
    scrollable=False,
    on_column_header_hover: Optional[Callable[[Optional[str]], None]] = None,
    column_header_info: Optional[solara.Element] = None,
):
    return DataTableWidget.element(...)

@solara.component
def DataFrame(
    df,
    items_per_page=20,
    column_actions: List[ColumnAction] = [],
    cell_actions: List[CellAction] = [],
    scrollable=False,
    on_column_header_hover: Optional[Callable[[Optional[str]], None]] = None,
    column_header_info: Optional[solara.Element] = None,
):
    columns = use_df_column_names(df)
    page, set_page = solara.use_state(0)

    items = []
    dfs = df_slice(df, i1, i2)
    records = df_records(dfs)
    for i in range(i2 - i1):
        item = {"__row__": i + i1}  # special key for the row number
        for column in columns:
            item[column] = format(dfs, column, i + i1, records[i][column])
        items.append(item)

    return DataTableView(
        items,
        columns,
        page=page,
        on_page=set_page,
        items_per_page=items_per_page,
        column_actions=column_actions,
        cell_actions=cell_actions,
        scrollable=scrollable,
        on_column_header_hover=on_column_header_hover,
        column_header_info=column_header_info,
    )

@solara.component
def IbisTable(t):
    # convert t to records
    records = []
    page, set_page = solara.use_state(0)
    return solara.DataTableView(records,
                                t.columns,
                                page=page,
                                on_page=set_page,
                                items_per_page=20,
                                format=format_default)

Something along this idea. I hope it makes sense. Note that in this we have DataFrame, which does the some convenience things, like doing the pagination for us, and we have DataTableView, which is stateless, and just shows what it's given. This makes it easier to create an IbisTable component, which can be created outside of solara.

I think we have to live with DataTable and keep it as is, to not break old code.

widgetti / solara

feat: support ibis tables (...and define a more formal protocol) #599