ydataai / ydata-profiling

1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.
https://docs.profiling.ydata.ai
MIT License
12.49k stars 1.68k forks source link

[Bug Report] ValueError: NaTType does not support strftime #1433

Open yuzeh opened 1 year ago

yuzeh commented 1 year ago

Current Behaviour

rendering a ProfileReport with tsmode=True crashes rendering timeseries gaps; The stack trace leads us to _render_gap_tab.

Expected Behaviour

see code

Data Description

see code

Code that reproduces the bug

import pandas as pd
import numpy as np
from ydata_profiling import ProfileReport

df = pd.DataFrame({"dt": pd.date_range(pd.to_datetime("2023-01-01"), pd.to_datetime("2023-02-01")), "y": np.arange(32)})
profile = ProfileReport(
    df,
    tsmode=True,
    sortby="dt",
    type_schema={
        "dt": "datetime",
        "y": "timeseries",
    },
)
profile.widgets

pandas-profiling version

v4.5.1

Dependencies

pandas==2.0.3

OS

No response

Checklist

priamai commented 1 year ago

+1 on this I am experiencing the same issue.

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 477 entries, 0 to 476
Data columns (total 6 columns):
 #   Column        Non-Null Count  Dtype         
---  ------        --------------  -----         
 0   DT            477 non-null    datetime64[ns]
 1   CHANNEL       477 non-null    object        
 2   IMPRESSIONS   477 non-null    int64         
 3   CLICKS        477 non-null    int64         
 4   CONVERSIONS   477 non-null    int64         
 5   AD_SPEND_USD  477 non-null    float64       
dtypes: datetime64[ns](1), float64(1), int64(3), object(1)
memory usage: 22.5+ KB

profile = ProfileReport(
    site_df,
    tsmode=True,
    type_schema=type_schema,
    sortby="DT",
    title="Time-Series EDA for a channel",
)

profile.to_file("report_timeseries.html")

Getting the same error as @yuzeh ValueError: NaTType does not support strftime

kylelt commented 1 year ago

+1 On also having this issue, i don't have time for a fix so i patched a workaround

if you are desparate this will bypass the issue:

Change: ydata_profiling/report/formatters.py function fmt_numeric starting at line 236 from

@list_args
def fmt_numeric(value: float, precision: int = 10) -> str:
    """Format any numeric value.

    Args:
        value: The numeric value to format.
        precision: The numeric precision

    Returns:
        The numeric value with the given precision.
    """

    fmtted = f"{{:.{precision}g}}".format(value)

    for v in ["e+", "e-"]:
        if v in fmtted:
            sign = "-" if v in "e-" else ""
            fmtted = fmtted.replace(v, " × 10<sup>") + "</sup>"
            fmtted = fmtted.replace("<sup>0", "<sup>")
            fmtted = fmtted.replace("<sup>", f"<sup>{sign}")

    return fmtted

Patched version, consequences unknown

@list_args
def fmt_numeric(value: float, precision: int = 10) -> str:
    """Format any numeric value.

    Args:
        value: The numeric value to format.
        precision: The numeric precision

    Returns:
        The numeric value with the given precision.
    """
    fmtted = None
    try:
        fmtted = f"{{:.{precision}g}}".format(value)
    except Exception as e:
        fmtted = str(value)+'e+1'

    for v in ["e+", "e-"]:
        if v in fmtted:
            sign = "-" if v in "e-" else ""
            fmtted = fmtted.replace(v, " × 10<sup>") + "</sup>"
            fmtted = fmtted.replace("<sup>0", "<sup>")
            fmtted = fmtted.replace("<sup>", f"<sup>{sign}")

    return fmtted

I think it might get to fmt_numeric wrongly though as the stack trace falls out the "else" catch all of one of the fmt_time functions... happy bug hunting

fabclmnt commented 1 year ago

Hi @yuzeh

thank for creating this issue. Indeed it seems something that only happens for pandas version bigger than 2. I've added to the backlog of tasks for the next package release.

@kylelt would you be open to contribute with a PR?

kylelt commented 1 year ago

thank for creating this issue. Indeed it seems something that only happens for pandas version bigger than 2. I've added to the backlog of tasks for the next package release.

@kylelt would you be open to contribute with a PR?

Yeah, the eta will be early december though as far as availability.

mritonia commented 1 year ago

Hi @yuzeh

thank for creating this issue. Indeed it seems something that only happens for pandas version bigger than 2. I've added to the backlog of tasks for the next package release.

@kylelt would you be open to contribute with a PR?

This is not isolated to just Pandas >= 2 .. I tested with pandas == 1.5.3 and I see the same error arise

fabclmnt commented 1 year ago

Hi @yuzeh thank for creating this issue. Indeed it seems something that only happens for pandas version bigger than 2. I've added to the backlog of tasks for the next package release. @kylelt would you be open to contribute with a PR?

This is not isolated to just Pandas >= 2 .. I tested with pandas == 1.5.3 and I see the same error arise

We have been using pandas version < 2 and weren't able to reproduce this error. Might be useful if you can share more details on your environment (python version, packages version, etc.)

jmrichardson commented 10 months ago

Running into this issue as well. Is there a fix on the horizon?