snowflakedb / snowpark-python

Snowflake Snowpark Python API
Apache License 2.0
266 stars 110 forks source link

SNOW-1661619: Local testing - incorrect behavior when joining and filtering on "isin" #2283

Open orrdermer1 opened 1 month ago

orrdermer1 commented 1 month ago

Please answer these questions before submitting your issue. Thanks!

  1. What version of Python are you using?

    Python 3.10.13

  2. What operating system and processor architecture are you using?

    macOS-14.6.1-arm64-arm-64bit

  3. What are the component versions in the environment (pip freeze)?

    ...Snowpark 1.22.0

  4. What did you do?

from snowflake.snowpark import Session
session = Session.builder.config("local_testing", True).create()

users = session.create_dataframe(
        [
            {"user_id": 1, "username": "Alice"},
            {"user_id": 2, "username": "Bob"},
            {"user_id": 3, "username": "Charlie"},
        ]
    )

group_memberships = session.create_dataframe(
    [
        {"group_id": 1, "user_id": 1, "status": "Active"},
        {"group_id": 2, "user_id": 1, "status": "Active"},
        {"group_id": 1, "user_id": 2, "status": "Disabled"},
        {"group_id": 2, "user_id": 2, "status": "Active"},
        {"group_id": 1, "user_id": 3, "status": "Active"},
    ]
)

df = (
    users
    .join(group_memberships, users["user_id"] == group_memberships["user_id"])
    .select(users["username"], group_memberships["group_id"], group_memberships["status"])
)

df.show()
"""
--------------------------------------
|"USERNAME"  |"GROUP_ID"  |"STATUS"  |
--------------------------------------
|Alice       |1           |Active    |
|Alice       |2           |Active    |
|Bob         |1           |Disabled  |
|Bob         |2           |Active    |
|Charlie     |1           |Active    |
--------------------------------------
"""

df.where(df["status"].isin(["Active"])).show()
"""
--------------------------------------
|"USERNAME"  |"GROUP_ID"  |"STATUS"  |
--------------------------------------
|Alice       |1           |Active    |
|Alice       |2           |Active    |
--------------------------------------
"""
  1. What did you expect to see?

The "where" in the end filters 2 records we still expect to see. "Regular" Snowpark session behave correctly - only the local testing is affected.

  1. Can you set logging to DEBUG and collect the logs? (Nothing shows during local testing)
sfc-gh-sghosh commented 1 month ago

Hello @orrdermer1 ,

Thanks for raising the issue, we are investigating, will update.

Regards, Sujan

sfc-gh-sghosh commented 1 month ago

Hello @orrdermer1 ,

We are able to reproduce the issue with local_testing and will work on eliminating it. will update.

Regards, Sujan