I expect the resulting pandas dataframe to have a non-duplicate index, as downstream processing expects this to be the case. This change was almost surely introduced in #787, which was rolled out in 2.6. It manifested for me when we bumped from 2.4.0 to 2.7.2. It can be mitigated by calling reset_index(drop=True) on the resulting dataframe, but it definitely was an unexpected deviation from past behavior.
Please answer these questions before submitting your issue. Thanks!
Python 3.7.5 (default, Dec 9 2021, 17:04:37) [GCC 8.4.0]
Linux-5.4.117-58.216.amzn2.x86_64-x86_64-with-Ubuntu-18.04-bionic
pip freeze
)?Package Version
absl-py 1.0.0 aerospike 4.0.0 aiobotocore 1.2.1 aiohttp 3.8.1 aioitertools 0.8.0 aiosignal 1.2.0 alembic 1.7.5 ansiwrap 0.8.4 anyio 3.5.0 argon2-cffi 21.3.0 argon2-cffi-bindings 21.2.0 asn1crypto 1.4.0 astor 0.8.1 astroid 2.5 async-generator 1.10 async-timeout 4.0.2 asynctest 0.13.0 attrs 20.3.0 Authlib 0.15.5 autopep8 1.5.5 awscli 1.18.212 azure-common 1.1.26 azure-core 1.21.1 azure-storage-blob 12.7.1 backcall 0.2.0 banal 1.0.6 beautifulsoup4 4.9.3 behave 1.2.6 behave-pandas 0.3.0 bitarray 1.6.3 black 21.12b0 bleach 4.1.0 blis 0.4.1 bokeh 2.2.2 boto3 1.16.52 botocore 1.19.52 bqplot 0.12.18 branca 0.3.1 brandprotobuf 3.6.2 Brotli 1.0.9 bs4 0.0.1 cached-property 1.5.2 cachetools 4.2.4 certifi 2021.10.8 certipy 0.1.3 cffi 1.15.0 chardet 3.0.4 charset-normalizer 2.0.10 chart-studio 1.0.0 click 7.1.2 cloudpickle 2.0.0 cmdstanpy 0.9.5 colorama 0.4.3 colorlover 0.3.0 convertdate 2.3.2 creative-lifecycle-manager 1.119.0 cron-descriptor 1.2.24 cryptography 35.0.0 cufflinks 0.17.3 cycler 0.11.0 cymem 2.0.6 Cython 0.29.26 dash 2.0.0 dash-bootstrap-components 0.10.7 dash-core-components 2.0.0 dash-html-components 2.0.0 dash-qc-components 1.0.0 dash-table 5.0.0 dask 2022.1.0 dask-kubernetes 0.11.0 datadog 0.26.0 dataset 1.5.2 dateparser 0.7.0 debugpy 1.5.1 decorator 5.1.1 defusedxml 0.7.1 Deprecated 1.2.13 df2gspread 1.0.4 distributed 2022.1.0 distro 1.6.0 docutils 0.15.2 elasticsearch 7.9.1 entrypoints 0.3 ephem 4.1.3 experimentfr-analytics 0.1.81 experimentfr-grpc 0.1.81 falcon 3.0.1 fbprophet 0.7.1 findspark 1.4.2 flake8 3.8.3 Flask 1.1.4 Flask-Caching 1.9.0 Flask-Compress 1.10.1 freezegun 1.1.0 frozenlist 1.3.0 fsspec 0.8.7 future 0.18.2 futures 3.1.1 gast 0.2.2 gitdb 4.0.9 GitPython 3.1.26 Glances 3.1.5 google-api-python-client 1.6.7 google-auth 1.35.0 google-auth-oauthlib 0.4.6 google-pasta 0.2.0 googleapis-common-protos 1.6.0 grpc-graphql-gateway-proto 0.46.0 grpcio 1.30.0 grpcio-health-checking 1.30.0 grpcio-status 1.30.0 grpcio-tools 1.30.0 gspread 5.1.1 gunicorn 20.1.0 h5py 3.6.0 HeapDict 1.0.1 hijri-converter 2.2.2 holidays 0.12 httplib2 0.20.2 hvac 0.9.6 idna 3.3 importlib-metadata 4.10.1 importlib-resources 5.4.0 iniconfig 1.1.1 invoke 1.6.0 ipydatawidgets 4.2.0 ipykernel 6.7.0 ipyleaflet 0.11.2 ipython 7.31.0 ipython-genutils 0.2.0 ipywidgets 7.6.5 isodate 0.6.1 isort 5.10.1 itsdangerous 1.1.0 jedi 0.17.2 Jinja2 2.11.3 jmespath 0.10.0 joblib 1.1.0 json5 0.9.6 jsonschema 4.4.0 jupyter-client 7.1.1 jupyter-console 6.2.0 jupyter-contrib-core 0.3.3 jupyter-contrib-nbextensions 0.5.1 jupyter-core 4.6.3 jupyter-highlight-selected-word 0.2.0 jupyter-kernel-gateway 2.4.3 jupyter-latex-envs 1.4.6 jupyter-lsp 0.9.2 jupyter-nbextensions-configurator 0.4.1 jupyter-server 1.13.3 jupyter-server-proxy 3.2.0 jupyter-telemetry 0.1.0 jupyterhub 2.0.2 jupyterlab 2.2.8 jupyterlab-code-formatter 1.3.6 jupyterlab-dash 0.1.0a3 jupyterlab-git 0.22.1 jupyterlab-iframe 0.2.3 jupyterlab-launcher 0.13.1 jupyterlab-pygments 0.1.2 jupyterlab-server 1.2.0 jupyterlab-widgets 1.0.2 jupytext 1.6.0 kazoo 2.8.0 kazurator 0.2.0 Keras 2.3.1 Keras-Applications 1.0.8 Keras-Preprocessing 1.1.2 kgrpc 1.0.13 kiwisolver 1.3.2 koalas 1.3.0 korean-lunar-calendar 0.2.1 kubernetes 17.17.0 kubernetes-asyncio 19.15.0 lazy-object-proxy 1.7.1 ldap3 2.8.1 llvmlite 0.32.1 locket 0.2.1 LunarCalendar 0.0.9 lxml 4.7.1 Mako 1.1.6 Markdown 3.3.6 markdown-it-py 0.5.8 MarkupSafe 2.0.1 matplotlib 3.2.2 matplotlib-inline 0.1.3 mccabe 0.6.1 metadata-parser 0.9.23 mistune 0.8.4 mmh3 2.5.1 mpmath 1.2.1 msgpack 1.0.3 msrest 0.6.21 multidict 5.2.0 murmurhash 1.0.6 mypy-extensions 0.4.3 nbclient 0.5.10 nbconvert 6.4.0 nbdime 2.1.1 nbformat 5.1.3 nbresuse 0.3.6 nest-asyncio 1.5.4 nltk 3.4.1 notebook 6.4.7 notification-center 0.12.0 numba 0.49.1 numpy 1.18.5 oauth2client 4.1.3 oauthenticator 0.8.0 oauthlib 3.1.1 opt-einsum 3.3.0 oscrypto 1.2.1 packaging 21.3 pamela 1.0.0 pandas 1.1.5 pandocfilters 1.5.0 papermill 2.3.0 parse 1.19.0 parse-type 0.5.2 parso 0.7.1 partd 0.3.10 pathspec 0.9.0 patsy 0.5.2 perspective-dash-component 0.0.7 perspective-python 1.1.0 pexpect 4.8.0 pickleshare 0.7.5 Pillow 9.0.0 pip 21.3.1 pipdeptree 2.0.0 plac 1.1.3 platformdirs 2.4.1 plotly 5.5.0 plotly-express 0.4.1 pluggy 1.0.0 preshed 3.0.6 prometheus-client 0.12.0 prompt-toolkit 3.0.24 protobuf 3.13.0 psutil 5.9.0 psycopg2-binary 2.8.6 ptyprocess 0.7.0 py 1.11.0 py4j 0.10.9 pyarrow 6.0.1 pyasn1 0.4.8 pyasn1-modules 0.2.8 pybuilder 0.12.6 pycodestyle 2.6.0 pycparser 2.21 pycryptodomex 3.10.1 pycurl 7.44.1 pydocstyle 5.1.1 pyflakes 2.2.0 PyGithub 1.55 Pygments 2.11.2 PyGObject 3.26.1 PyHamcrest 1.9.0 PyJWT 2.3.0 pylint 2.7.1 pymc3 3.5 PyMeeus 0.5.11 PyNaCl 1.5.0 pyOpenSSL 21.0.0 pyparsing 3.0.6 pyrsistent 0.18.1 PySocks 1.6.8 pyspark 3.1.2 pystan 2.19.0.0 pytest 6.2.5 python-apt 1.6.5+ubuntu0.7 python-dateutil 2.8.2 python-json-logger 2.0.2 python-jsonrpc-server 0.4.0 python-language-server 0.35.1 python-oauth2 1.1.0 python-pptx 0.6.18 python-snappy 0.6.0 pythreejs 2.2.1 pytz 2021.3 PyYAML 5.3.1 pyzmq 22.3.0 regex 2020.11.13 requests 2.27.1 requests-futures 1.0.0 requests-oauthlib 1.3.0 requests-toolbelt 0.9.1 retry 0.9.2 retrying 1.3.3 rope 0.18.0 rsa 4.5 rtb-deployer-client 1.0b254 rtb-deployer-grpc 1.0b129 ruamel.yaml 0.17.20 ruamel.yaml.clib 0.2.6 s3fs 0.4.2 s3transfer 0.3.7 scikit-learn 1.0.2 scipy 1.4.1 seaborn 0.11.0 Send2Trash 1.8.0 setuptools 60.8.1 setuptools-git 1.2 simpervisor 0.4 simple-salesforce 1.10.1 simplegeneric 0.8.1 six 1.16.0 slacker 0.14.0 smart-open 5.0.0 smmap 5.0.0 sniffio 1.2.0 snowballstemmer 2.2.0 snowflake-connector-python 2.7.2 snowflake-sqlalchemy 1.2.4 sortedcontainers 2.4.0 soupsieve 2.3.1 spacy 2.2.2 sparkhub-client 1.0.235 sparkmeasure 0.14.0 sparkmonitor 1.1.0 SQLAlchemy 1.3.23 sqlparse 0.2.4 srsly 1.0.5 ssh-import-id 5.11 statsmodels 0.13.1 survey-configuration-service 0.28.0 sympy 1.3 tabulate 0.8.9 tblib 1.7.0 tenacity 8.0.1 tensorboard 2.1.1 tensorflow 2.1.0 tensorflow-estimator 2.1.0 tensorflow-gpu 2.1.0 termcolor 1.1.0 terminado 0.12.1 testpath 0.5.0 textwrap3 0.9.2 Theano 1.0.5 thinc 7.3.1 thirdparty-protoc-gen-validate 0.3.0 threadpoolctl 3.0.0 timeago 1.0.15 tinys3 0.1.12 toml 0.10.2 tomli 1.2.3 toolz 0.11.2 toposort 1.6 torch 1.6.0 torchtext 0.2.3 tornado 6.1 tornado-proxy-handlers 0.0.5 tqdm 4.62.3 traitlets 5.1.1 traittypes 0.2.1 typed-ast 1.4.3 typing_extensions 4.0.1 tzlocal 2.1 ua-parser 0.8.0 ujson 5.1.0 unattended-upgrades 0.1 uritemplate 3.0.1 urllib3 1.26.8 wasabi 0.9.0 wcwidth 0.2.5 webencodings 0.5.1 websocket-client 0.53.0 Werkzeug 1.0.1 wheel 0.37.1 widgetsnbextension 3.5.2 wrapt 1.12.1 xarray 0.20.2 xeus-python 0.11.2 xgboost 1.2.1 XlsxWriter 3.0.2 xlwt 1.3.0 yapf 0.30.0 yarl 1.7.2 zict 2.0.0 zipp 3.7.0 zope.interface 4.7.2
ran
I expect the resulting pandas dataframe to have a non-duplicate index, as downstream processing expects this to be the case. This change was almost surely introduced in #787, which was rolled out in 2.6. It manifested for me when we bumped from 2.4.0 to 2.7.2. It can be mitigated by calling
reset_index(drop=True)
on the resulting dataframe, but it definitely was an unexpected deviation from past behavior.