snowflakedb / snowflake-connector-python

Snowflake Connector for Python
https://pypi.python.org/pypi/snowflake-connector-python/
Apache License 2.0
601 stars 473 forks source link

SNOW-1775418: Add argument to write_pandas() to not write index #2096

Closed stibu81 closed 2 weeks ago

stibu81 commented 3 weeks ago

What is the current behavior?

The function write_pandas() writes the index of the pandas DataFrame into a column "INDEX", if the index has the expected format. (Otherwise, there is a warning and the index is not written.)

What is the desired behavior?

It should be possible to set an argument (e.g., write_index=False) such that the index is never written.

How would this improve snowflake-connector-python?

This would allow the user to not write the index, if it is not needed.

References and other background

No response

sfc-gh-sghosh commented 2 weeks ago

Hello @stibu81 ,

Thanks for raising the request. Could you clarify the requirement?

At present we know that when write_pandas() is called, the function does not write the custom index to the Snowflake table. Instead, it ignores the index entirely because it is not a standard RangeIndex. In both cases, whether you have a custom index or a standard RangeIndex, the write_pandas() function does not save the index to the Snowflake table.

In case if custom index, it will throw a warning and index will not be written. Example:

`pandas_df = pd.DataFrame([(1, "Steve"), (2, "Bob")], columns=["id", "name"]) pandas_df.index = ['row1', 'row2'] # Custom index

session.write_pandas(pandas_df, "WRITE_PANDAS_TABLE2", auto_create_table=True, overwrite=True)

result_df=session.sql("SELECT * FROM WRITE_PANDAS_TABLE2").to_pandas() print(result_df)

output: UserWarning: Pandas Dataframe has non-standard index of type <class 'pandas.core.indexes.base.Index'> which will not be written. Consider changing the index to pd.RangeIndex(start=0,...,step=1) or call reset_index() to keep index as column(s) id name 0 1 Steve 1 2 Bob`

In case if RangeIndex, there will be no warning but index will be ignored. `pandas_df = pd.DataFrame([(1, "Steve"), (2, "Bob")], columns=["id", "name"]) session.write_pandas(pandas_df, "WRITE_PANDAS_TABLE2", auto_create_table=True, overwrite=True)

result_df=session.sql("SELECT * FROM WRITE_PANDAS_TABLE2").to_pandas() print(result_df)

output: id name 0 1 Steve 1 2 Bob`

Now whats your requirement with new option write_index=False ? you want an option to explicitly disable writing the index, even if it is a RangeIndex or if the DataFrame has no custom indexes?

Regards, Sujan

stibu81 commented 2 weeks ago

Hi Sujan

Thanks for you reply and the clarification. Indeed, using your examples I realised that no index is written. I made a simple mistake by forgetting that reset_index() not only resets the index, but also adds it as a column to the DataFrame. And then, of course, that column is written to snowflake.

So, in the end, write_pandas() is doing exactly what I would expect it to do and I was simply not careful enough. I'm sorry for the unnecessary confusion that I caused.

Best regards, Stefan