snowflakedb / snowpark-python

Snowflake Snowpark Python API
Apache License 2.0
272 stars 112 forks source link

SNOW-1045584: Pylint (3.0.3) throws errors when using the `dataframe.collect()` method #1241

Open EloiSanchez opened 9 months ago

EloiSanchez commented 9 months ago

Please answer these questions before submitting your issue. Thanks!

  1. What version of Python are you using?

    Python 3.10.12

  2. What operating system and processor architecture are you using?

    Linux-6.5.0-15-generic-x86_64-with-glibc2.35

  3. What are the component versions in the environment (pip freeze)?

aiohttp==3.9.1 aiosignal==1.3.1 altair==4.2.2 asn1crypto==1.5.1 astroid==3.0.2 asttokens==2.4.1 async-timeout==4.0.3 attrs==23.1.0 black==23.12.0 blinker==1.7.0 cachetools==5.3.2 certifi==2023.11.17 cffi==1.16.0 cfgv==3.4.0 charset-normalizer==3.3.2 click==8.1.7 cloudpickle==2.0.0 comm==0.2.0 cron-descriptor==1.4.0 cryptography==41.0.7 debugpy==1.8.0 decorator==5.1.1 dill==0.3.7 distlib==0.3.8 entrypoints==0.4 et-xmlfile==1.1.0 exceptiongroup==1.2.0 executing==2.0.1 filelock==3.13.1 frozenlist==1.4.0 gitdb==4.0.11 GitPython==3.1.40 identify==2.5.33 idna==3.6 importlib-metadata==7.0.0 ipykernel==6.27.1 ipython==8.18.1 isort==5.13.1 jedi==0.19.1 Jinja2==3.1.2 jsonschema==4.20.0 jsonschema-specifications==2023.11.2 jupyter_client==8.6.0 jupyter_core==5.5.0 markdown-it-py==3.0.0 MarkupSafe==2.1.3 matplotlib-inline==0.1.6 mccabe==0.7.0 mdurl==0.1.2 multidict==6.0.4 mypy-extensions==1.0.0 nest-asyncio==1.5.8 nodeenv==1.8.0 numpy==1.26.2 openpyxl==3.1.2 packaging==23.2 pandas==2.0.3 parso==0.8.3 pathspec==0.12.1 pexpect==4.9.0 Pillow==10.1.0 platformdirs==3.11.0 pre-commit==3.6.0 prompt-toolkit==3.0.43 protobuf==3.20.3 psutil==5.9.6 ptyprocess==0.7.0 pure-eval==0.2.2 pyarrow==14.0.1 pycparser==2.21 pydeck==0.8.1b0 Pygments==2.17.2 PyJWT==2.8.0 pylint==3.0.3 Pympler==1.0.1 pyOpenSSL==23.3.0 python-dateutil==2.8.2 pytz==2023.3.post1 PyYAML==6.0.1 pyzmq==25.1.2 referencing==0.32.0 requests==2.31.0 rich==13.7.0 rpds-py==0.13.2 six==1.16.0 smmap==5.0.1 snowflake-connector-python==3.6.0 snowflake-snowpark-python==1.7.0 sortedcontainers==2.4.0 stack-data==0.6.3 streamlit==1.22.0 tenacity==8.2.3 toml==0.10.2 tomli==2.0.1 tomlkit==0.12.3 toolz==0.12.0 tornado==6.4 traitlets==5.14.0 typing_extensions==4.9.0 tzdata==2023.3 tzlocal==5.2 urllib3==2.1.0 validators==0.22.0 virtualenv==20.25.0 watchdog==3.0.0 wcwidth==0.2.12 yarl==1.9.4 zipp==3.17.0

  1. What did you do?

    Let's say I have a custom class like the one on the bottom (minimal example in order to see the errors)

from snowflake.snowpark.session import Session
from snowflake.snowpark.table import Table
from snowflake.snowpark.row import Row

class JobTable(Table):

    def __init__(self, session: Session) -> None:
        super().__init__("some_schema.some_table", session)

    def get_first(self) -> Row:
        return self.collect()[0]

    def do_whatever(self) -> None:
        for _ in self.collect():
            pass

When linting this file (pylint 3.0.3) these two errors appear.

test.py:12:15: E1136: Value 'self.collect()' is unsubscriptable (unsubscriptable-object)
test.py:15:17: E1133: Non-iterable value self.collect() is used in an iterating context (not-an-iterable)

In the next screenshot you can see that the information about the collect method is properly obtained and it shows the return type hint of List[Row]]. Therefore the linter should not be complaining, since lists are both subscriptable and iterable. In fact, if I change every call to self.collect() by list(self.collect()) the errors are suppressed.

image

  1. What did you expect to see?

    The linter should be able to understand that the result of the .collect() method is a list of rows and, therefore, not throw the unsubscriptable and non-iterable object errors.

    Do you know where the problem is? Should I be doing something different whit the type hinting and/or linting configuration? Is this expected behavior? Thanks in advance!

  2. Can you set logging to DEBUG and collect the logs?

    import logging
    
    for logger_name in ('snowflake.snowpark', 'snowflake.connector'):
       logger = logging.getLogger(logger_name)
       logger.setLevel(logging.DEBUG)
       ch = logging.StreamHandler()
       ch.setLevel(logging.DEBUG)
       ch.setFormatter(logging.Formatter('%(asctime)s - %(threadName)s %(filename)s:%(lineno)d - %(funcName)s() - %(levelname)s - %(message)s'))
       logger.addHandler(ch)

    This is not a RunTime error, so I think this will not give us any extra information.

EloiSanchez commented 9 months ago

1248 may be related