SNOW-542421: Parallel Fetch with fetch_pandas_all results in duplicate index values. #1061

Closed jacksonrnewhouse closed 2 years ago

jacksonrnewhouse commented 2 years ago

  1. What version of Python are you using?

Python 3.7.5 (default, Dec 9 2021, 17:04:37) [GCC 8.4.0]

  1. What operating system and processor architecture are you using?


  1. What are the component versions in the environment (pip freeze)?

Package Version

  1. What did you do?


import snowflake
from snowflake.connector import DictCursor

conn = snowflake.connector.connect(**CONNECTION)
cursor = conn.cursor(DictCursor)
result = cursor.execute("select * FROM big_table")
df = result.fetch_pandas_all()
# 298505
# 16628
# 44
# 44
  1. What did you expect to see?

I expect the resulting pandas dataframe to have a non-duplicate index, as downstream processing expects this to be the case. This change was almost surely introduced in #787, which was rolled out in 2.6. It manifested for me when we bumped from 2.4.0 to 2.7.2. It can be mitigated by calling reset_index(drop=True) on the resulting dataframe, but it definitely was an unexpected deviation from past behavior.

sfc-gh-mkeller commented 2 years ago

This should have been fixed by #1068 Please reopen, if this is not the case