snowflakedb / snowpark-python

Snowflake Snowpark Python API
Apache License 2.0
259 stars 108 forks source link

SNOW-977836: The withColumnRenamed fucntion fails to rename a column if the snowpark dataframe has multiple columns with same name but with different case style #1148

Open Ilyas-kipi opened 10 months ago

Ilyas-kipi commented 10 months ago

Please answer these questions before submitting your issue. Thanks!

  1. What version of Python are you using?

Python 3.10.8 (tags/v3.10.8:aaaf517, Oct 11 2022, 16:50:30) [MSC v.1933 64 bit (AMD64)]

  1. What operating system and processor architecture are you using?

Windows-10-10.0.19045-SP0

  1. What are the component versions in the environment (pip freeze)?

altair==4.2.2 asn1crypto==1.5.1 attrs==23.1.0 blinker==1.6.2 cachetools==5.3.1 certifi==2023.7.22 cffi==1.15.1 charset-normalizer==3.2.0 click==8.1.7 cloudpickle==2.0.0 colorama==0.4.6 cron-descriptor==1.4.0 croniter==1.4.1 cryptography==41.0.3 entrypoints==0.4 filelock==3.12.3 gitdb==4.0.10 GitPython==3.1.32 greenlet==2.0.2 idna==3.4 importlib-metadata==6.8.0 Jinja2==3.1.2 jsonschema==4.19.0 jsonschema-specifications==2023.7.1 markdown-it-py==3.0.0 MarkupSafe==2.1.3 mdurl==0.1.2 numpy==1.25.2 oscrypto==1.3.0 packaging==23.1 pandas==2.0.3 Pillow==10.0.0 platformdirs==3.8.1 plotly==5.16.1 protobuf==3.20.3 pyarrow==13.0.0 pycparser==2.21 pycryptodomex==3.18.0 pydeck==0.8.1b0 Pygments==2.16.1 PyJWT==2.8.0 Pympler==1.0.1 pyOpenSSL==23.2.0 python-dateutil==2.8.2 pytz==2023.3 PyYAML==6.0.1 referencing==0.30.2 requests==2.31.0 rich==13.5.2 rpds-py==0.10.0 six==1.16.0 smmap==5.0.0 snowflake-connector-python==3.5.0 snowflake-snowpark-python==1.10.0 snowflake-sqlalchemy==1.5.0 sortedcontainers==2.4.0 SQLAlchemy==1.4.49 streamlit==1.22.0 tenacity==8.2.3 toml==0.10.2 tomlkit==0.12.1 toolz==0.12.0 tornado==6.3.3 typing_extensions==4.7.1 tzdata==2023.3 tzlocal==5.0.1 urllib3==1.26.16 validators==0.21.2 vega-datasets==0.9.0 watchdog==3.0.0 zipp==3.16.2

  1. What did you do?

    from snowflake.snowpark.types import IntegerType, StringType, StructField
    schema = StructType([StructField("id", IntegerType()), StructField("Snow Flake", StringType()), StructField("SNOW FLAKE",StringType())])
    df = session.create_dataframe([[1, "snow", "flake"], [3, "snow", "flake"]], schema)
    df.with_column_renamed('"Snow Flake"','"Snow Flake Renamed"').show()
  2. What did you expect to see?

    I expected the column 'Snow Flake' to be renamed to 'Snow Flake Renamed' but I ran into the following exception

    SnowparkColumnException: Unable to rename the column "Snow Flake" as "Snow Flake Renamed" because this DataFrame has 2 columns named "Snow Flake".

suenalaba commented 9 months ago

If by design, duplicate column names are allowable in Snowpark's API, then I think we should allow a way for the user to edit column names which are duplicated.

Example: duplicated_col duplicated_col
'some-value' 'some-value-2'
should be able to be changed to: new_col_name_1 new_col_name_2
'some-value' 'some-value-2'

Instead of just throwing the error you showed.

Ilyas-kipi commented 9 months ago

Hello @suenalaba, Sry If I was not clear. By definition we cannot create a snowpark dataframe with ambigious column names but we can create a snowpark dataframe that has two columns with same name but different case (i.e) Name of column 1 - "Snow Flake" Name of column 2 - "SNOW FLAKE"

Example dataframe :-

Capture

When we try to rename Column - 1 (i.e) "Snow Flake" to "Snow Flake Renamed" using the below command df.with_column_renamed('"Snow Flake"','"Snow Flake Renamed"').show()

we run into an exception -> SnowparkColumnException: Unable to rename the column "Snow Flake" as "Snow Flake Renamed" because this DataFrame has 2 columns named "Snow Flake".

This is because, in the current implementation, the with_column_renamed method, the column being renamed is converted to upper case and then it is checked if there are any more columns with same name, In our case, we have a column that matches upper("Snow Flake") in snowpark dataframe and hence we run into this exception. I've addressed this in my PR #1149