snowflakedb / snowpark-python

Snowflake Snowpark Python API
Apache License 2.0
256 stars 106 forks source link

SNOW-1635365: ImportError when importing modin.pandas #2139

Closed nkrishnan closed 1 week ago

nkrishnan commented 3 weeks ago

Please answer these questions before submitting your issue. Thanks!

  1. What version of Python are you using?

    3.11 (though it also fails with versions 3.10 and 3.9)

  2. What operating system and processor architecture are you using? Linux-5.4.181-99.354.amzn2.aarch64-aarch64-with-glibc2.34 (Snowflake console)

  3. What are the component versions in the environment (pip freeze)?

Bottleneck @ file:///croot/bottleneck_1707864228049/work cloudpickle @ file:///croot/cloudpickle_1683039989541/work numexpr @ file:///croot/numexpr_1683221807349/work numpy @ file:///croot/numpy_and_numpy_base_1682520579168/work pandas @ file:///croot/pandas_1718308985289/work/dist/pandas-2.2.2-cp311-cp311-linux_aarch64.whl#sha256=674db162e43bbc538e97ef39726edb7be6c1eec6bdaad45776fb032165cecd42 pip==24.0 pyarrow @ file:///croot/pyarrow_1721664224167/work/python python-dateutil @ file:///croot/python-dateutil_1694417999291/work pytz @ file:///croot/pytz_1713974315080/work PyYAML @ file:///croot/pyyaml_1698096055839/work setuptools==69.5.1 six @ file:///tmp/build/80754af9/six_1644875935023/work snowflake-connector-python @ file:///repo/conda-bld/stored-proc-python-connector_1716236429655/work snowflake-snowpark-python @ file:///croot/snowflake-snowpark-python_1721429091110/work typing_extensions @ file:///croot/typing_extensions_1715268839942/work tzdata @ file:///croot/python-tzdata_1690578112552/work wheel==0.43.0

  1. What did you do?
import snowflake.snowpark.modin.pandas as pd

def main(session):
  return "1"
  1. What did you expect to see?

    import should succeed

  2. Can you set logging to DEBUG and collect the logs?

Traceback (most recent call last): Worksheet, line 6, in File "snowflake/snowpark/modin/pandas/init.py", line 93, in from snowflake.snowpark.modin.pandas.dataframe import DataFrame File "snowflake/snowpark/modin/pandas/dataframe.py", line 76, in from snowflake.snowpark.modin.pandas.base import _ATTRS_NO_LOOKUP, BasePandasDataset File "snowflake/snowpark/modin/pandas/base.py", line 77, in from snowflake.snowpark.modin.pandas.utils import ( File "snowflake/snowpark/modin/pandas/utils.py", line 46, in from snowflake.snowpark.modin.core.execution.dispatching.factories.dispatcher import ( File "snowflake/snowpark/modin/core/execution/dispatching/factories/init.py", line 24, in from snowflake.snowpark.modin.core.execution.dispatching.factories import ( # noqa: F401 File "snowflake/snowpark/modin/core/execution/dispatching/factories/factories.py", line 36, in from snowflake.snowpark.modin.core.execution.dispatching.factories.baseio import BaseIO File "snowflake/snowpark/modin/core/execution/dispatching/factories/baseio.py", line 36, in from snowflake.snowpark.modin.utils import _inherit_docstrings File "snowflake/snowpark/modin/utils.py", line 53, in from snowflake.snowpark.modin.plugin.utils.error_message import ErrorMessage File "snowflake/snowpark/modin/plugin/init.py", line 62, in from snowflake.snowpark.modin.plugin import docstrings # isort: skip # noqa: E402 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "snowflake/snowpark/modin/plugin/docstrings/init.py", line 14, in from snowflake.snowpark.modin.plugin.docstrings.series import Series File "snowflake/snowpark/modin/plugin/docstrings/series.py", line 16, in from snowflake.snowpark.modin.utils import _create_operator_docstring ImportError: cannot import name '_create_operator_docstring' from partially initialized module 'snowflake.snowpark.modin.utils' (most likely due to a circular import) (/usr/lib/python_udf/c99e89e498caeef04ed9ac7328379977bc8cbae72867ffed5bfa8936f3d9fdb1/lib/python3.11/site-packages/snowflake/snowpark/modin/utils.py)

sfc-gh-sghosh commented 3 weeks ago

Hello @nkrishnan ,

Thanks for raising the issue. I just tried the code and its working, there is no error. I used Python 3.11, Snowpark 1.19.0, pandas 2.2.1 and installed modin as well pip install "snowflake-snowpark-python[modin] as per the documentations

Prerequisites: Python 3.9, 3.10 or 3.11, modin version 0.28.1, and pandas version 2.2.1 are required.

No error for below code snippets:

import snowflake.snowpark.modin.pandas as pd

def main(session):
  return "1"
import modin.pandas as pd
import snowflake.snowpark.modin.plugin
from snowflake.snowpark import Session

# Session.builder.create() will create a default Snowflake connection.
#Session.builder.create()

snowpark_df = pd.session.sql('select * from MYCSVTABLE1')
snowpark_df.show()

--------------------------------------
|"SEQ"  |"LAST_NAME"  |"FIRST_NAME"  |
--------------------------------------
|1      |AAAAAAA      |Maxwell       |
|2      |AAABBB       |Craig         |
|3      |newuser      |zxxzx         |
|4      |forthuser    |forthlast     |
|1      |AAAAAAA      |Maxwell       |
|2      |AAABBB       |Craig         |
|3      |newuser      |zxxzx         |
|4      |forthuser    |forthlast     |
|1      |Maxwell      |AAAAAAA       |
|2      |Craig        |AAABBB        |
nkrishnan commented 2 weeks ago

With python=3.11, modin=0.28.1, pandas=2.2.1 I get the same error with

import snowflake.snowpark.modin.pandas as pd

def main(session):
    return "1"

That import is per documentation here: https://docs.snowflake.com/en/developer-guide/snowpark/reference/python/latest/modin/pandas_api/snowflake.snowpark.modin.pandas.read_snowflake

This does work

import modin.pandas as pd
import snowflake.snowpark.modin.plugin

def main(session):
    return "1"

Looks like its due to what import is attempted.

sfc-gh-joshi commented 1 week ago

@nkrishnan The second import import modin.pandas as pd; import snowflake.snowpark.modin.plugin is meant to be the only one that works: in v1.18 and later, the first import (import snowflake.snowpark.modin.pandas as pd) is no longer supposed to work.

That import is per documentation here: https://docs.snowflake.com/en/developer-guide/snowpark/reference/python/latest/modin/pandas_api/snowflake.snowpark.modin.pandas.read_snowflake

Thanks for pointing this out, I'll update the documentation for next release to use the correct import statement.