snowflakedb / snowpark-python

Snowflake Snowpark Python API
Apache License 2.0
263 stars 108 forks source link

SNOW-1368516: Creating a dataframe from pandas dataframe with datetime objects fail to set the right column type #1522

Open sfc-gh-stan opened 5 months ago

sfc-gh-stan commented 5 months ago
data = {
        "pandas_datetime": ["2021-09-30 12:00:00", "2021-09-30 13:00:00"],
        "date": [pd.to_datetime("2010-1-1"), pd.to_datetime("2011-1-1")],
        "datetime.datetime": [
            datetime.datetime(2010, 1, 1),
            datetime.datetime(2010, 1, 1),
        ],
    }
    pdf = pd.DataFrame(data)
    pdf["pandas_datetime"] = pd.to_datetime(pdf["pandas_datetime"])
    df = session.create_dataframe(pdf)
   print(df.schema)

prints

StructType([StructField('"pandas_datetime"', LongType(), nullable=True), StructField('"date"', LongType(), nullable=True), StructField('"datetime.datetime"', LongType(), nullable=True)])

which can be traced to utility function src/snowflake/snowpark/mock/_pandas_util.py::_extract_schema_and_data_from_pandas_df extracting the schema wrong. This is a bug that blocks the test tests/integ/test_dataframe.py::test_create_dataframe_with_pandas_df from being enabled to run against Local Testing.

sfc-gh-sghosh commented 5 months ago

Hello @sfc-gh-stan ,

Thanks for raising the issue.

Yes, the schema information is incorrect with local testing compare to default session.

default session: StructType([StructField('"pandas_datetime"', TimestampType(tz=ntz), nullable=True), StructField('"date"', TimestampType(tz=ntz), nullable=True), StructField('"datetime.datetime"', TimestampType(tz=ntz), nullable=True)])

local_testing: StructType([StructField('"pandas_datetime"', LongType(), nullable=True), StructField('"date"', LongType(), nullable=True), StructField('"datetime.datetime"', LongType(), nullable=True)])

We will work on eliminating it and update.

Regards, Sujan