Open Ilyas-kipi opened 10 months ago
If by design, duplicate column names are allowable in Snowpark's API, then I think we should allow a way for the user to edit column names which are duplicated.
Example: | duplicated_col | duplicated_col |
---|---|---|
'some-value' | 'some-value-2' |
should be able to be changed to: | new_col_name_1 | new_col_name_2 |
---|---|---|
'some-value' | 'some-value-2' |
Instead of just throwing the error you showed.
Hello @suenalaba, Sry If I was not clear. By definition we cannot create a snowpark dataframe with ambigious column names but we can create a snowpark dataframe that has two columns with same name but different case (i.e) Name of column 1 - "Snow Flake" Name of column 2 - "SNOW FLAKE"
Example dataframe :-
When we try to rename Column - 1 (i.e) "Snow Flake" to "Snow Flake Renamed" using the below command df.with_column_renamed('"Snow Flake"','"Snow Flake Renamed"').show()
we run into an exception -> SnowparkColumnException: Unable to rename the column "Snow Flake" as "Snow Flake Renamed" because this DataFrame has 2 columns named "Snow Flake".
This is because, in the current implementation, the with_column_renamed method, the column being renamed is converted to upper case and then it is checked if there are any more columns with same name, In our case, we have a column that matches upper("Snow Flake") in snowpark dataframe and hence we run into this exception. I've addressed this in my PR #1149
Please answer these questions before submitting your issue. Thanks!
Python 3.10.8 (tags/v3.10.8:aaaf517, Oct 11 2022, 16:50:30) [MSC v.1933 64 bit (AMD64)]
Windows-10-10.0.19045-SP0
pip freeze
)?altair==4.2.2 asn1crypto==1.5.1 attrs==23.1.0 blinker==1.6.2 cachetools==5.3.1 certifi==2023.7.22 cffi==1.15.1 charset-normalizer==3.2.0 click==8.1.7 cloudpickle==2.0.0 colorama==0.4.6 cron-descriptor==1.4.0 croniter==1.4.1 cryptography==41.0.3 entrypoints==0.4 filelock==3.12.3 gitdb==4.0.10 GitPython==3.1.32 greenlet==2.0.2 idna==3.4 importlib-metadata==6.8.0 Jinja2==3.1.2 jsonschema==4.19.0 jsonschema-specifications==2023.7.1 markdown-it-py==3.0.0 MarkupSafe==2.1.3 mdurl==0.1.2 numpy==1.25.2 oscrypto==1.3.0 packaging==23.1 pandas==2.0.3 Pillow==10.0.0 platformdirs==3.8.1 plotly==5.16.1 protobuf==3.20.3 pyarrow==13.0.0 pycparser==2.21 pycryptodomex==3.18.0 pydeck==0.8.1b0 Pygments==2.16.1 PyJWT==2.8.0 Pympler==1.0.1 pyOpenSSL==23.2.0 python-dateutil==2.8.2 pytz==2023.3 PyYAML==6.0.1 referencing==0.30.2 requests==2.31.0 rich==13.5.2 rpds-py==0.10.0 six==1.16.0 smmap==5.0.0 snowflake-connector-python==3.5.0 snowflake-snowpark-python==1.10.0 snowflake-sqlalchemy==1.5.0 sortedcontainers==2.4.0 SQLAlchemy==1.4.49 streamlit==1.22.0 tenacity==8.2.3 toml==0.10.2 tomlkit==0.12.1 toolz==0.12.0 tornado==6.3.3 typing_extensions==4.7.1 tzdata==2023.3 tzlocal==5.0.1 urllib3==1.26.16 validators==0.21.2 vega-datasets==0.9.0 watchdog==3.0.0 zipp==3.16.2
What did you do?
What did you expect to see?
I expected the column 'Snow Flake' to be renamed to 'Snow Flake Renamed' but I ran into the following exception
SnowparkColumnException: Unable to rename the column "Snow Flake" as "Snow Flake Renamed" because this DataFrame has 2 columns named "Snow Flake".