pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
42.62k stars 17.57k forks source link

ENH/BUG: pd.date_range() still defaults to nanosecond resolution #59031

Open jorisvandenbossche opened 2 weeks ago

jorisvandenbossche commented 2 weeks ago

After https://github.com/pandas-dev/pandas/pull/55901, to_datetime with strings will now infer the resolution from the data, but the related pd.date_range to create datetime data still returns nanoseconds:

In [5]: pd.date_range("2012-01-01", periods=3, freq="1min")
Out[5]: 
DatetimeIndex(['2012-01-01 00:00:00', '2012-01-01 00:01:00',
               '2012-01-01 00:02:00'],
              dtype='datetime64[ns]', freq='min')

In [6]: pd.to_datetime(['2012-01-01 00:00:00', '2012-01-01 00:01:00', '2012-01-01 00:02:00'])
Out[6]: 
DatetimeIndex(['2012-01-01 00:00:00', '2012-01-01 00:01:00',
               '2012-01-01 00:02:00'],
              dtype='datetime64[s]', freq=None)

Should we update pd.date_range as well to infer the resulting resolution from the start/stop timestamp and freq ?

(I encountered this inconsistency in the pyarrow tests, where we essentially were using both idioms to create a result and expected data, but so that started failing because of a different dtype. I also opened https://github.com/pandas-dev/pandas/issues/58989 for that, but regardless of a possible default resolution, pd.date_range would still need to follow that as well)

tilovashahrin commented 2 weeks ago

Hi @jorisvandenbossche I made a few additions to the date_range function. Let me know what you think!