snowflakedb / snowpark-python

Snowflake Snowpark Python API
Apache License 2.0
275 stars 112 forks source link

SNOW-719751: ARRAY_AGG is missing options for WITHIN GROUP ( <orderby_clause> ) and partition by #645

Open orellabac opened 1 year ago

orellabac commented 1 year ago

Please answer these questions before submitting your issue. Thanks!

  1. What version of Python are you using?

Python 3.8.12

  1. What operating system and processor architecture are you using?

Linux-5.4.0-1094-azure-x86_64-with-glibc2.17

  1. What are the component versions in the environment (pip freeze)?

aiohttp==3.8.3 aiosignal==1.3.1 alembic==1.9.0 altair==4.2.0 anyio==3.6.2 apache-airflow-providers-common-sql==1.3.1 apache-airflow-providers-ftp==3.2.0 apache-airflow-providers-http==4.1.0 apache-airflow-providers-imap==3.1.0 apache-airflow-providers-sqlite==3.3.1 apispec==3.3.2 argcomplete==2.0.0 argon2-cffi==21.3.0 argon2-cffi-bindings==21.2.0 arrow==1.2.3 asn1crypto==1.5.1 astor==0.8.1 astroid==2.9.3 asttokens==2.2.1 async-generator==1.10 async-timeout==4.0.2 attrs==22.2.0 Babel==2.11.0 backcall==0.2.0 backports.zoneinfo==0.2.1 bds-testing @ file:///bds_testing-0.0.1-py3-none-any.whl beakerx-kernel-scala==2.0.0 beautifulsoup4==4.11.1 bleach==5.0.1 blinker==1.5 bottle==0.12.23 brotlipy==0.7.0 cachelib==0.9.0 cachetools==5.2.0 cattrs==22.2.0 certifi==2021.10.8 certipy==0.1.3 cffi @ file:///opt/conda/conda-bld/cffi_1642701102775/work charset-normalizer @ file:///tmp/build/80754af9/charset-normalizer_1630003229654/work click==8.0.4 clickclick==20.10.2 cloudpickle==2.0.0 colorama==0.4.6 colorlog==4.8.0 comm==0.1.2 commonmark==0.9.1 conda==4.11.0 conda-content-trust @ file:///tmp/build/80754af9/conda-content-trust_1617045594566/work conda-package-handling @ file:///tmp/build/80754af9/conda-package-handling_1618262148928/work ConfigUpdater==3.1.1 coverage==6.5.0 cron-descriptor==1.2.32 croniter==1.3.8 cryptography @ file:///tmp/build/80754af9/cryptography_1639400846433/work debugpy==1.6.4 decorator==5.1.1 defusedxml==0.7.1 Deprecated==1.2.13 dill==0.3.1.1 dnspython==2.2.1 docutils==0.19 email-validator==1.3.0 entrypoints==0.4 et-xmlfile==1.1.0 eventlet==0.33.2 exceptiongroup==1.0.4 executing==1.2.0 fastjsonschema==2.16.2 filelock==3.8.2 findspark==2.0.1 Flask==2.2.2 Flask-Babel==2.0.0 Flask-Caching==2.0.1 Flask-JWT-Extended==4.4.4 Flask-Login==0.6.2 Flask-Session==0.4.0 Flask-SQLAlchemy==2.5.1 Flask-WTF==1.0.1 fqdn==1.5.1 frozenlist==1.3.3 fuzzywuzzy==0.18.0 gevent==22.10.2 gitdb==4.0.10 GitPython==3.1.29 google-auth==2.15.0 google-auth-oauthlib==0.8.0 graphviz==0.20.1 greenlet==2.0.1 gunicorn==20.1.0 h11==0.14.0 h3==3.7.4 html2text==2020.1.16 httpcore==0.16.3 httpx==0.23.1 idna @ file:///tmp/build/80754af9/idna_1637925883363/work importlib-metadata==5.2.0 importlib-resources==5.10.1 inflection==0.5.1 iniconfig==1.1.1 ipykernel==6.19.4 ipython==8.7.0 ipython-genutils==0.2.0 ipywidgets==8.0.4 isodate==0.6.1 isoduration==20.11.0 isort==5.11.4 itsdangerous==2.1.2 jedi==0.18.2 Jinja2==3.1.2 jsonpath-ng==1.5.3 jsonpointer==2.3 jsonschema==4.17.3 jupyter==1.0.0 jupyter-console==6.4.4 jupyter-events==0.5.0 jupyter-server-proxy==1.5.2 jupyter-telemetry==0.1.0 jupyter-vscode-proxy==0.1 jupyter_client==7.4.8 jupyter_core==5.1.1 jupyter_server_terminals==0.4.3 jupyterhub==1.4.2 jupyterlab-pygments==0.2.2 jupyterlab-widgets==3.0.5 lazy-object-proxy==1.8.0 Levenshtein==0.20.8 linkify-it-py==2.0.0 lockfile==0.12.2 lxml==4.9.2 Mako==1.2.4 Markdown==3.4.1 markdown-it-py==2.1.0 MarkupSafe==2.1.1 marshmallow==3.19.0 marshmallow-enum==1.5.1 matplotlib-inline==0.1.6 mccabe==0.6.1 mdit-py-plugins==0.3.1 mdurl==0.1.2 metakernel==0.29.4 mistune==2.0.4 multidict==6.0.3 nbclient==0.7.2 nbconvert==7.2.7 nbformat==5.7.1 nest-asyncio==1.5.6 networkx==2.8.8 notebook==6.4.1 numpy==1.24.0 oauthlib==3.2.2 openpyxl==3.0.10 oscrypto==1.3.0 packaging==22.0 pamela==1.0.0 pandas==1.5.2 pandocfilters==1.5.0 parso==0.8.3 pathspec==0.9.0 pendulum==2.1.2 pexpect==4.8.0 pickleshare==0.7.5 Pillow==9.3.0 pkgutil_resolve_name==1.3.10 platformdirs==2.6.0 pluggy==1.0.0 ply==3.11 prison==0.2.1 prometheus-client==0.15.0 prompt-toolkit==3.0.36 protobuf==4.21.12 proxy.py==2.4.3 psutil==5.9.4 ptyprocess==0.7.0 pure-eval==0.2.2 py==1.11.0 py4j==0.10.9.5 pyarrow==8.0.0 pyasn1==0.4.8 pyasn1-modules==0.2.8 pycobertura==3.0.0 pycosat==0.6.3 pycparser @ file:///tmp/build/80754af9/pycparser_1636541352034/work pycryptodomex==3.16.0 pydeck==0.8.0 pydotplus==2.0.2 Pygments==2.13.0 PyJWT==2.6.0 pylint==2.12.2 Pympler==1.0.1 pyngrok==5.2.1 pyodbc==4.0.35 pyOpenSSL @ file:///tmp/build/80754af9/pyopenssl_1635333100036/work pyparsing==3.0.9 pypi-search==1.2.1 pyrsistent==0.19.2 PySocks @ file:///tmp/build/80754af9/pysocks_1605305779399/work pytest==6.2.5 pytest-csv==3.0.0 pytest-excel==1.5.0 python-daemon==2.3.2 python-dateutil==2.8.2 python-decouple==3.6 python-gitlab-api==0.2.32 python-json-logger==2.0.4 python-Levenshtein==0.20.8 python-nvd3==0.15.0 python-slugify==7.0.0 pytz==2022.7 pytz-deprecation-shim==0.1.0.post0 pytzdata==2020.1 PyYAML==6.0 pyzmq==24.0.1 qtconsole==5.4.0 QtPy==2.3.0 rapidfuzz==2.13.7 rdflib==6.2.0 requests @ file:///opt/conda/conda-bld/requests_1641824580448/work requests-oauthlib==1.3.1 requests-toolbelt==0.10.1 requirements-parser==0.5.0 rfc3339-validator==0.1.4 rfc3986==1.5.0 rfc3986-validator==0.1.1 rich==12.6.0 rsa==4.9 ruamel-yaml-conda @ file:///tmp/build/80754af9/ruamel_yaml_1616016699510/work ruamel.yaml==0.17.21 ruamel.yaml.clib==0.2.7 semver==2.13.0 Send2Trash==1.8.0 setproctitle==1.3.2 shortuuid==1.0.11 simpervisor==0.4 six @ file:///tmp/build/80754af9/six_1623709665295/work smmap==5.0.0 sniffio==1.3.0 snowconvert-deploy-tool==0.0.20 snowconvert-helpers==2.0.14 snowflake-connector-python==2.9.0 snowflake-snowpark-python==1.0.0 soupsieve==2.3.2.post1 spylon==0.3.0 spylon-kernel==0.4.1 SQLAlchemy==1.4.45 SQLAlchemy-JSONField==1.0.0 SQLAlchemy-Utils==0.39.0 sqlparse==0.4.3 stack-data==0.6.2 streamlit==1.8.1 swagger-ui-bundle==0.0.9 tabulate==0.9.0 tenacity==8.1.0 termcolor==2.1.1 terminado==0.17.1 text-unidecode==1.3 tinycss2==1.2.1 toml==0.10.2 toolz==0.12.0 tornado==6.2 tqdm @ file:///tmp/build/80754af9/tqdm_1635330843403/work traitlets==5.8.0 typer==0.7.0 types-setuptools==65.6.0.2 typing_extensions==4.4.0 tzdata==2022.7 tzlocal==4.2 uc-micro-py==1.0.1 unicodecsv==0.14.1 uri-template==1.2.0 url-normalize==1.4.3 urllib3==1.26.7 validators==0.20.0 watchdog==2.2.0 wcwidth==0.2.5 webcolors==1.12 webencodings==0.5.1 websocket-client==1.4.2 Werkzeug==2.2.2 widgetsnbextension==4.0.5 wrapt==1.13.3 WTForms==3.0.1 xlrd==2.0.1 XlsxWriter==3.0.3 yarl==1.8.2 zipp==3.11.0 zope.event==4.6 zope.interface==5.5.2

  1. What did you do?

I tried to use the array_agg with those options but there is no within_group nor partition options

  1. What did you expect to see?

    A viable option to use those options when a group by clause is used

  2. Can you set logging to DEBUG and collect the logs?

N/A

orellabac commented 1 year ago

Currently the only workaround I have seen is to use a sql_expr

sfc-gh-yixie commented 1 year ago

@orellabac Could you try this?

array_agg("a").within_group(col("b").desc())
sfc-gh-yixie commented 1 year ago
        >>> df_array_agg_window = df.select(array_agg("a").within_group(col("a").desc()).over(Window.partitionBy(col("b"))).alias("new_column"))
        >>> df_array_agg_window.show()