pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
42.68k stars 17.6k forks source link

ENH: Introduce type-safe constructors for `Timestamp` and `Timedelta`. #58475

Open randolf-scholz opened 2 months ago

randolf-scholz commented 2 months ago

Feature Type

Problem Description

The default constructors pd.Timestamp.__new__ and pd.Timedelta.__new__ can return NaT, which is a different type. This can lead to silent type errors, depending on the type-checker used. Consider the following example:

import numpy as np
from pandas import Timedelta, Timestamp

t: Timestamp = Timestamp("2024-04-29T18:00:00")
t2: Timestamp = Timestamp(np.datetime64("nat"))  # actually NaTType!
dt: Timedelta = Timedelta(1, "h")
dt2: Timedelta = Timedelta(np.timedelta64("nat"))  # actually NaTType!

Type-checking results:

Feature Description

Introduce new constructors timestamp and timedelta (in analogy to how pyarrow does constructors), which are guaranteed to return pd.Timestamp and pd.Timedelta types, or raise an exception in the case when NaT is encountered.

Alternative Solutions

Split pd.NaT into two different types, Timestamp("NaT") and Timedelta("NaT") (as is the case in numpy), which are instances of the respective types. (https://github.com/pandas-dev/pandas/issues/24983)

Additional Context

randolf-scholz commented 2 months ago

These constructors can be very simple wrappers, a rough sketch:

def timedelta(value: Any = ..., unit: Optional[str] = None, **kwargs: Any) -> Timedelta:
    """Utility function that ensures that the constructor does not return NaT."""
    td = (
        Timedelta(unit=unit, **kwargs)
        if value is Ellipsis
        else Timedelta(value, unit=unit, **kwargs)
    )
    if isinstance(td, NaTType):
        raise ValueError("Constructor returned NaT")
    return td

def timestamp(value: Any = ..., **kwargs: Any) -> Timestamp:
    """Utility function that ensures that the constructor does not return NaT."""
    ts = Timestamp(**kwargs) if value is Ellipsis else Timestamp(value, **kwargs)
    if isinstance(ts, NaTType):
        raise ValueError("Constructor returned NaT")
    return ts
jbrockmendel commented 2 months ago

There’s an issue about introducing a separate NaTD specific to Timedelta. If you did that ( and the same for Period), then NaT could become a Timestamp, and you would get type-safety in the constructors without new constructors