Snowflake Snowpark Python API
SNOW-977836: The withColumnRenamed fucntion fails to rename a column if the snowpark dataframe has multiple columns with same name but with different case style

Open Ilyas-kipi opened 10 months ago

Ilyas-kipi commented 10 months ago

  1. What version of Python are you using?

Python 3.10.8 (tags/v3.10.8:aaaf517, Oct 11 2022, 16:50:30) [MSC v.1933 64 bit (AMD64)]

  1. What operating system and processor architecture are you using?


  1. What are the component versions in the environment (pip freeze)?

  1. What did you do?

    from snowflake.snowpark.types import IntegerType, StringType, StructField
    schema = StructType([StructField("id", IntegerType()), StructField("Snow Flake", StringType()), StructField("SNOW FLAKE",StringType())])
    df = session.create_dataframe([[1, "snow", "flake"], [3, "snow", "flake"]], schema)
    df.with_column_renamed('"Snow Flake"','"Snow Flake Renamed"').show()
  2. What did you expect to see?

    I expected the column 'Snow Flake' to be renamed to 'Snow Flake Renamed' but I ran into the following exception

    SnowparkColumnException: Unable to rename the column "Snow Flake" as "Snow Flake Renamed" because this DataFrame has 2 columns named "Snow Flake".

suenalaba commented 9 months ago

If by design, duplicate column names are allowable in Snowpark's API, then I think we should allow a way for the user to edit column names which are duplicated.

Example: duplicated_col duplicated_col
'some-value' 'some-value-2'
should be able to be changed to: new_col_name_1 new_col_name_2
'some-value' 'some-value-2'

Instead of just throwing the error you showed.

Ilyas-kipi commented 9 months ago

Hello @suenalaba, Sry If I was not clear. By definition we cannot create a snowpark dataframe with ambigious column names but we can create a snowpark dataframe that has two columns with same name but different case (i.e) Name of column 1 - "Snow Flake" Name of column 2 - "SNOW FLAKE"

Example dataframe :-


When we try to rename Column - 1 (i.e) "Snow Flake" to "Snow Flake Renamed" using the below command df.with_column_renamed('"Snow Flake"','"Snow Flake Renamed"').show()

we run into an exception -> SnowparkColumnException: Unable to rename the column "Snow Flake" as "Snow Flake Renamed" because this DataFrame has 2 columns named "Snow Flake".

This is because, in the current implementation, the with_column_renamed method, the column being renamed is converted to upper case and then it is checked if there are any more columns with same name, In our case, we have a column that matches upper("Snow Flake") in snowpark dataframe and hence we run into this exception. I've addressed this in my PR #1149