pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.83k stars 18k forks source link

ENH: Decimal year #60391

Open dshean opened 12 hours ago

dshean commented 12 hours ago

Feature Type

Problem Description

I wish I could use pandas to quickly convert datetime/Timestamp objects to "decimal year" floating point numbers for subsequent visualization and analysis.

A number of plotting packages (e.g., GeoPandas, matplotlib) encounter issues when casting datetime/Timestamp objects to float. For example, I often encounter errors when trying to create a choropleth map to visualize a GeoDataFrame column containing datetime objects. Decimal years also simplify the legend/colorbar labels.

example decimal year map

Feature Description

This is a simple function to accomplish this. It's not perfect, but does the job. Would need to re-implement as a Timestamp and/or dt accessor property (dt.decyear). Should be relatively simple, I think.

#Decimal year (useful for plotting)
from datetime import datetime as dt
import time
def toYearFraction(date):
    def sinceEpoch(date): # returns seconds since epoch
        return time.mktime(date.timetuple())
    s = sinceEpoch

    year = date.year
    startOfThisYear = dt(year=year, month=1, day=1)
    startOfNextYear = dt(year=year+1, month=1, day=1)

    yearElapsed = s(date) - s(startOfThisYear)
    yearDuration = s(startOfNextYear) - s(startOfThisYear)
    fraction = yearElapsed/yearDuration

    return date.year + fraction

Alternative Solutions

Define and apply a custom function: df['dt_col_decyear'] = df['dt_col'].apply(toYearFraction)

Additional Context

When attempting to plot column containing datetime values...

gdf.plot(column='dt_col', legend=True)

File [~/sw/miniconda3/envs/shean_py3/lib/python3.12/site-packages/geopandas/plotting.py:175](http://localhost:8888/lab/tree/src/stereo-lidar_archive_search/notebooks/~/sw/miniconda3/envs/shean_py3/lib/python3.12/site-packages/geopandas/plotting.py#line=174), in _plot_polygon_collection(ax, geoms, values, color, cmap, vmin, vmax, autolim, **kwargs)
    172 collection = PatchCollection([_PolygonPatch(poly) for poly in geoms], **kwargs)
    174 if values is not None:
--> 175     collection.set_array(np.asarray(values))
    176     collection.set_cmap(cmap)
    177     if "norm" not in kwargs:

File [~/sw/miniconda3/envs/shean_py3/lib/python3.12/site-packages/matplotlib/cm.py:452](http://localhost:8888/lab/tree/src/stereo-lidar_archive_search/notebooks/~/sw/miniconda3/envs/shean_py3/lib/python3.12/site-packages/matplotlib/cm.py#line=451), in ScalarMappable.set_array(self, A)
    450 A = cbook.safe_masked_invalid(A, copy=True)
    451 if not np.can_cast(A.dtype, float, "same_kind"):
--> 452     raise TypeError(f"Image data of dtype {A.dtype} cannot be "
    453                     "converted to float")
    455 self._A = A
    456 if not self.norm.scaled():

TypeError: Image data of dtype object cannot be converted to float
rhshadrach commented 12 hours ago

Thanks for the request. Can you provide input, a proposed syntax for the operation, and what your expected output would be.

dshean commented 10 hours ago

Sure. Something like df['dt_col'].dt.decyear could work well, using the dt accessor.

Would convert column of datetime64 (e.g.,2024-11-15 12:13:12+00:00) to float64 (e.g., 2024.872976)

AryanK1511 commented 7 hours ago

@rhshadrach if you don't mind, I would love to work on this issue