Open ThibTrip opened 4 years ago
I have found the culprit and there was a similar problem in the PR #11216.
So in the end the issue is not directly related to SQL and I think the example below pinpoints the problem. I don't see a solution other than discarding offsets.
import pandas as pd
import datetime
import psycopg2
from pandas.api.types import is_datetime64_any_dtype
# create datetimes with different offsets (60 and 120 minutes respectively)
data = [[datetime.datetime(2019, 11, 14, 16, 12,
tzinfo=psycopg2.tz.FixedOffsetTimezone(offset=60))],
[datetime.datetime(2019, 8, 7, 15, 37, 4,
tzinfo=psycopg2.tz.FixedOffsetTimezone(offset=120))]]
# different offsets causes the data to be read as object dtype\
# instead of any datetime dtype
# pd.DataFrame.from_records is what is used in functions that read_sql uses
df = pd.DataFrame.from_records(data, columns = ['ts'])
df.dtypes
ts object
dtype: object
# also this outputs False instead of True
is_datetime64_any_dtype(df['ts'])
False
# upon using pd.to_sql pd.to_datetime will be executed but
# it won't work since there are different offsets
pd.to_datetime(df['ts'])
pd.to_datetime(df['ts'], utc = True)
0 2019-11-14 15:12:00+00:00
1 2019-08-07 13:37:04+00:00
Name: ts, dtype: datetime64[ns, UTC]
I tested this directly in the master so you'll find the output of pd.show_versions() here again.
It's pandas' policy to convert TIMESTAMP WITH TIME ZONE
database types to UTC in pandas, so converting the incoming data to UTC is an acceptable solution. Happy to have a PR with the change!
https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#datetime-data-types
Hi @mroeschke, thanks for your answer :). I was able to patch to_sql function. I was also able to patch the read_sql function but perhaps this is not a desired behavior. I will explain everything in a PR soon (after Christmas time).
take
Code Sample, a copy-pastable example if possible
Output
Problem description
There are 2 problems here:
pandas reads 2 columns of the test DataFrame I save in posgres as "object" and not "datetime64[ns, UTC]" although in postgres the data types are all "timestamp with time zone"
when I attempt to append the postgres table to itself via pandas (pd.DataFrame.read_sql then pd.DataFrame.to_sql) it fails when trying to convert the 2 columns "object" to datetime. So perhaps there is an issue with pd.to_datetime or I am missing something here.
I would make another issue for the second problem but I cannot reproduce it other than with this workflow.
Expected Output
When reading the data from the postgres table I expect those datatypes:
Also I expect pd.DataFrame.to_sql to not throw any exception even in the case where "create_date" and "change_date" are of "object" dtypes.
Output of
pd.show_versions()