ibis-project / ibis

the portable Python dataframe library
https://ibis-project.org
Apache License 2.0
4.3k stars 537 forks source link

bug: timestamp casting from integers in polars is broken #9091

Open ncclementi opened 2 weeks ago

ncclementi commented 2 weeks ago

On main, when casting integers (epoch in seconds for example) to timestamps and executing with polars backends, results are wrong.

import ibis
from datetime import datetime
import ibis.expr.datatypes as dt

con = ibis.polars.connect()
date = datetime(2015, 1, 1, 12, 34, 56) #2015-01-01 12:34:56  

date_sec = int(date.strftime("%s")) #epoch in seconds

t = ibis.memtable([{"date_sec": date_sec, "date_mili": int(date_sec*1e3), "date_micro": int(date_sec*1e6), "date_nano": int(date_sec*1e9) }])

#this should recover the date but the hours are off and the dtype is also us (?)
con.execute(t.date_sec.cast(dt.Timestamp()))
0   2015-01-01 17:34:56
Name: Cast(date_sec, timestamp), dtype: datetime64[us]

Now if we try to cast the milliseconds, microseconds, and nanoseconds passing scale in the timestamp we see way weirder behaviors.

milliseconds

con.execute(t.date_mili.cast(dt.Timestamp(scale=3)))
0   46972-03-30 14:13:20
Name: Cast(date_mili, timestamp(3)), dtype: datetime64[us]

microseconds

con.execute(t.date_micro.cast(dt.Timestamp(scale=6)))
0   -6447-07-18 19:52:44.525568
Name: Cast(date_micro, timestamp(6)), dtype: datetime64[us]

nanoconds

con.execute(t.date_nano.cast(dt.Timestamp(scale=9)))
...
   1680 if background:
   1681     return InProcessQuery(ldf.collect_concurrently())
-> 1683 return wrap_df(ldf.collect())

PanicException: attempt to calculate the remainder with a divisor of zero
gforsyth commented 2 weeks ago

0 2015-01-01 17:34:56 Name: Cast(date_sec, timestamp), dtype: datetime64[us]

I don't think this is necessarily wrong?

The us thing is a red herring, that's just how pandas is representing the precision of the timestamp.

epoch seconds should be number of seconds since 1970-01-01 in UTC, so skipping back 5 hours would be correct given your timezone (in January, anyway)

The other precisions are definitely a problem