pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.6k stars 17.9k forks source link

BUG: there is no year 0 #55462

Open jbrockmendel opened 1 year ago

jbrockmendel commented 1 year ago
ts = pd.Timestamp("001-01-01")
td = pd.Timedelta(days=1).as_unit("s")

>>> ts - td
Timestamp('0-12-31 00:00:00')
>>> (ts - td).year
0

We document in a few places that we are using the Proleptic Gregorian calendar which does not have a year 0. The year before 1AD is 1BC in this calendar.

np.datetime64 objects have the same behavior (cc @sberg)

Expected Behavior

Refuse to parse year 0, create a Timestamp with year=0, subtraction skips year 0.

Or document the current behavior.

Kman303 commented 1 year ago

I'd like to try and fix this bug, if I could get the issue assigned to me

jbrockmendel commented 1 year ago

I'd advise against working on this until there is consensus on how to address the issue.

jorisvandenbossche commented 1 year ago

FWIW, Arrow has the same behaviour as numpy:

In [9]: pc.subtract(pa.array([pd.Timestamp("001-01-01")]), pa.array([pd.Timedelta(days=1)]))
Out[9]: 
<pyarrow.lib.TimestampArray object at 0x7fd268b57e80>
[
  0000-12-31 00:00:00.000000
]
jorisvandenbossche commented 1 year ago

Quoting the linked wiki page:

For these calendars one can distinguish two systems of numbering years BC. Bede and later historians did not enumerate any year as zero (nulla in Latin; see Year zero); therefore the year preceding AD 1 is 1 BC. In this system the year 1 BC is a leap year (likewise in the proleptic Julian calendar). Mathematically, it is more convenient to include a year 0 and represent earlier years as negative numbers for the specific purpose of facilitating the calculation of the number of years between a negative (BC) year and a positive (AD) year. This is the convention in astronomical year numbering and the international standard date system, ISO 8601. In these systems, the year 0 is a leap year.[4]

So this mentions that the "the international standard date system, ISO 8601" uses a year zero. And in the end, one could also say that what we see in the output above is an ISO8601 formatted timestamp?

Kman303 commented 12 months ago

take

jorisvandenbossche commented 12 months ago

@Kman303 as @jbrockmendel mentioned above, there is not yet a clear resolution about what exactly needs to be done

jbrockmendel commented 11 months ago

I like Joris's current-behavior-is-fine approach.

miccoli commented 10 months ago

As already commented by @jorisvandenbossche current behaviour is expected.

Numpy docs ^1 clarify datetime64 semantics listing the conventions adopted, namely

It seems to me that it would be safe to simply make a reference in the pandas docs to the numpy ones.