ui.aggrid.from_pandas not working with DatetimeIndex

johancj commented 1 year ago

Description

I have a pandas.DataFrame which has a column with type DatetimeIndex.

When used in ui.aggrid.from_pandas(df) it throws an error: TypeError: Type is not JSON serializable: Timestamp

Small example:

import pandas as pd
from nicegui import ui

df = pd.DataFrame({'numbers': [i for i in range(3)],
                   'dates': pd.date_range('2023-01-01', periods=3)})

ui.aggrid.from_pandas(df)

ui.run()

I have found a work around: casting DatetimeIndex to str.

date_cols = df.columns[df.dtypes=='datetime64[ns]']
df[date_cols] = df[date_cols].astype(str) # Convert datetime to str

I think this should be handled by nicegui, either by handeling the casting or in some other way.

falkoschindler commented 1 year ago

Thanks for reporting this issue, @johancj, and for sharing the workaround!

I think this should be handled by nicegui

I'm not sure if simply converting it via .astype(str) is always the desired behavior or if there are alternative approaches, making it hard to decide for a common solution baked into ui.aggrid.

And is 'datetime64[ns]' the only datatype we need to consider, or is this only the tip of the iceberg?

johancj commented 1 year ago

It probably is the top of the iceberg. I tested some different stpes and got the same issue with the following:

datetime64[ns]
timedelta64[ns]
complex128

I tested the following types:

int64
float64
object
bool
datetime64[ns]
category
object
timedelta64[ns]
complex128
int64
uint8
int32
float32
float64
bool
uint16
int16
int8

falkoschindler commented 1 year ago

Ok, here is an example with all problematic datatypes I could find and their conversion:

from datetime import datetime, timedelta
import pandas as pd
from nicegui import ui

df = pd.DataFrame({
    'datetime_col': [datetime.now(), datetime(2020, 1, 1), datetime(2022, 12, 31)],
    'timedelta_col': [timedelta(days=5), timedelta(days=10), timedelta(days=15)],
    'complex_col': [1 + 2j, 2 + 3j, 3 + 4j],
    'period_col': pd.Series([pd.Period('2021-01'), pd.Period('2021-02'), pd.Period('2021-03')]),
})

date_cols = df.columns[df.dtypes == 'datetime64[ns]']
time_cols = df.columns[df.dtypes == 'timedelta64[ns]']
complex_cols = df.columns[df.dtypes == 'complex128']
period_cols = df.columns[df.dtypes == 'period[M]']
df[date_cols] = df[date_cols].astype(str)
df[time_cols] = df[time_cols].astype(str)
df[complex_cols] = df[complex_cols].astype(str)
df[period_cols] = df[period_cols].astype(str)

ui.aggrid.from_pandas(df)

But I'm not sure if we really should convert these columns as part of from_pandas:

The user might want to use a different date format, e.g. without microseconds or date only.
Modifying the original dataframe might have unexpected consequences. I wouldn't expect this method to modify the original data.
We could copy the dataframe before converting columns. But this sounds like a rather expensive operation that should only be done if there are any columns to convert.

Long story short, I see two solutions:

If the dataframe contains such datatypes, make a copy and convert the respective columns. Add a note to the docstring that this is happening automatically. This way the method works, and if the user wants to tweak the conversion, they can easily do it outside and circumvent the automatic conversion.
Detect these datatypes and print a warning with an instruction how to handle the JSON serialization problem and a link to this issue.

Contrary to my initial reaction I tend towards solution 1. What do you think, @johancj?

bapowell commented 1 year ago

@falkoschindler I agree with your first suggested solution. Moreover, copying the dataframe could be conditional, i.e. don't copy if all of the cols series (date_cols, time_cols, complex_cols, period_cols) are empty. That way no copy penalty is incurred unless the dataframe actually contains those problematic data types.

By the way, we ran into this same issue with JustPy. The solution there was to implement a row_data_converter, which is flexible, but I think the solution you guys have come up with here is nicer.

Finally, if the need for more manipulation of the dataframe becomes apparent, we could enhance the from_pandas method with the same "dtype" and "converters" optional args that the pandas read_csv method includes. The following is an example of that (from here):

df_2 = pd.read_csv("sales_data_types.csv",
                   dtype={'Customer Number': 'int'},
                   converters={'2016': convert_currency,
                               '2017': convert_currency,
                               'Percent Growth': convert_percent,
                               'Jan Units': lambda x: pd.to_numeric(x, errors='coerce'),
                               'Active': lambda x: np.where(x == "Y", True, False)
                              })

johancj commented 1 year ago

@falkoschindler I also agree that option 1 seems like the best idea. As you said, this way it at least works for the out of the box.

zauberzeug / nicegui

ui.aggrid.from_pandas not working with DatetimeIndex #1698

Description