unionai-oss / pandera

A light-weight, flexible, and expressive statistical data testing library
https://www.union.ai/pandera
MIT License
3.27k stars 305 forks source link

`@pa.dataframe_check` does not work correctly with pandas==1.1.5 #734

Closed alphavector closed 2 years ago

alphavector commented 2 years ago

Describe the bug A clear and concise description of what the bug is.

Code Sample, a copy-pastable example

import pandas as pd
import pandera as pa

from pandera.typing import DataFrame, Series

class Schema(pa.SchemaModel):
    a: Series[pa.INT64] = pa.Field(nullable=True, coerce=True)
    b: Series[pa.INT64] = pa.Field(nullable=True, coerce=True)

    @pa.dataframe_check(ignore_na=False)
    def some_check(cls, df: pd.DataFrame) -> Series[bool]:
        return df["a"].isna() | ~df["b"].isna()

df = pd.DataFrame(
    {
        'a': [1, 2, 3, 1],
        'b': [1, 2, None, None],
    }
)

Schema.validate(df, lazy=True)

Expected behavior

Expecting raising SchemaErrors

Actual behavior

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-2-aebaf4815bc2> in <module>()
     15 )
     16 
---> 17 Schema.validate(df, lazy=True)

7 frames
/usr/local/lib/python3.7/dist-packages/pandas/core/indexing.py in _validate_read_indexer(self, key, indexer, axis, raise_missing)
   1302             if raise_missing:
   1303                 not_found = list(set(key) - set(ax))
-> 1304                 raise KeyError(f"{not_found} not in index")
   1305 
   1306             # we skip the warning on Categorical

KeyError: "['index'] not in index"

Additional context

I updated pandas to version 1.3.5 and everything worked I used Google Colab to reproduce the problem

python           : 3.7.12.final.0
python-bits      : 64
OS               : Linux
OS-release       : 5.4.144+
Version          : #1 SMP Tue Dec 7 09:58:10 PST 2021
machine          : x86_64
processor        : x86_64
byteorder        : little
LC_ALL           : None
LANG             : en_US.UTF-8
LOCALE           : en_US.UTF-8
pip freeze ``` absl-py==0.12.0 alabaster==0.7.12 albumentations==0.1.12 altair==4.2.0 appdirs==1.4.4 argcomplete==2.0.0 argon2-cffi==21.3.0 argon2-cffi-bindings==21.2.0 arviz==0.11.4 astor==0.8.1 astropy==4.3.1 astunparse==1.6.3 atari-py==0.2.9 atomicwrites==1.4.0 attrs==21.4.0 audioread==2.1.9 autograd==1.3 Babel==2.9.1 backcall==0.2.0 beautifulsoup4==4.6.3 bleach==4.1.0 blis==0.4.1 bokeh==2.3.3 Bottleneck==1.3.2 branca==0.4.2 bs4==0.0.1 CacheControl==0.12.10 cached-property==1.5.2 cachetools==4.2.4 catalogue==1.0.0 certifi==2021.10.8 cffi==1.15.0 cftime==1.5.1.1 chardet==3.0.4 charset-normalizer==2.0.10 click==7.1.2 cloudpickle==1.3.0 cmake==3.12.0 cmdstanpy==0.9.5 colorcet==3.0.0 colorlover==0.3.0 community==1.0.0b1 contextlib2==0.5.5 convertdate==2.3.2 coverage==3.7.1 coveralls==0.5 crcmod==1.7 cufflinks==0.17.3 cvxopt==1.2.7 cvxpy==1.0.31 cycler==0.11.0 cymem==2.0.6 Cython==0.29.26 daft==0.0.4 dask==2.12.0 datascience==0.10.6 debugpy==1.0.0 decorator==4.4.2 defusedxml==0.7.1 descartes==1.1.0 dill==0.3.4 distributed==1.25.3 dlib @ file:///dlib-19.18.0-cp37-cp37m-linux_x86_64.whl dm-tree==0.1.6 docopt==0.6.2 docutils==0.17.1 dopamine-rl==1.0.5 earthengine-api==0.1.293 easydict==1.9 ecos==2.0.10 editdistance==0.5.3 en-core-web-sm @ https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.2.5/en_core_web_sm-2.2.5.tar.gz entrypoints==0.3 ephem==4.1.3 et-xmlfile==1.1.0 fa2==0.3.5 fastai==1.0.61 fastdtw==0.3.4 fastprogress==1.0.0 fastrlock==0.8 fbprophet==0.7.1 feather-format==0.4.1 filelock==3.4.2 firebase-admin==4.4.0 fix-yahoo-finance==0.0.22 Flask==1.1.4 flatbuffers==2.0 folium==0.8.3 future==0.16.0 gast==0.4.0 GDAL==2.2.2 gdown==3.6.4 gensim==3.6.0 geographiclib==1.52 geopy==1.17.0 gin-config==0.5.0 glob2==0.7 google==2.0.3 google-api-core==1.26.3 google-api-python-client==1.12.8 google-auth==1.35.0 google-auth-httplib2==0.0.4 google-auth-oauthlib==0.4.6 google-cloud-bigquery==1.21.0 google-cloud-bigquery-storage==1.1.0 google-cloud-core==1.0.3 google-cloud-datastore==1.8.0 google-cloud-firestore==1.7.0 google-cloud-language==1.2.0 google-cloud-storage==1.18.1 google-cloud-translate==1.5.0 google-colab @ file:///colabtools/dist/google-colab-1.0.0.tar.gz google-pasta==0.2.0 google-resumable-media==0.4.1 googleapis-common-protos==1.54.0 googledrivedownloader==0.4 graphviz==0.10.1 greenlet==1.1.2 grpcio==1.43.0 gspread==3.0.1 gspread-dataframe==3.0.8 gym==0.17.3 h5py==3.1.0 HeapDict==1.0.1 hijri-converter==2.2.2 holidays==0.10.5.2 holoviews==1.14.7 html5lib==1.0.1 httpimport==0.5.18 httplib2==0.17.4 httplib2shim==0.0.3 humanize==0.5.1 hyperopt==0.1.2 ideep4py==2.0.0.post3 idna==2.10 imageio==2.4.1 imagesize==1.3.0 imbalanced-learn==0.8.1 imblearn==0.0 imgaug==0.2.9 importlib-metadata==4.10.0 importlib-resources==5.4.0 imutils==0.5.4 inflect==2.1.0 iniconfig==1.1.1 intel-openmp==2022.0.1 intervaltree==2.1.0 ipykernel==4.10.1 ipython==5.5.0 ipython-genutils==0.2.0 ipython-sql==0.3.9 ipywidgets==7.6.5 itsdangerous==1.1.0 jax==0.2.25 jaxlib @ https://storage.googleapis.com/jax-releases/cuda111/jaxlib-0.1.71+cuda111-cp37-none-manylinux2010_x86_64.whl jdcal==1.4.1 jedi==0.18.1 jieba==0.42.1 Jinja2==2.11.3 joblib==1.1.0 jpeg4py==0.1.4 jsonschema==4.3.3 jupyter==1.0.0 jupyter-client==5.3.5 jupyter-console==5.2.0 jupyter-core==4.9.1 jupyterlab-pygments==0.1.2 jupyterlab-widgets==1.0.2 kaggle==1.5.12 kapre==0.3.6 keras==2.7.0 Keras-Preprocessing==1.1.2 keras-vis==0.4.1 kiwisolver==1.3.2 korean-lunar-calendar==0.2.1 libclang==12.0.0 librosa==0.8.1 lightgbm==2.2.3 llvmlite==0.34.0 lmdb==0.99 LunarCalendar==0.0.9 lxml==4.2.6 Markdown==3.3.6 MarkupSafe==2.0.1 matplotlib==3.2.2 matplotlib-inline==0.1.3 matplotlib-venn==0.11.6 missingno==0.5.0 mistune==0.8.4 mizani==0.6.0 mkl==2019.0 mlxtend==0.14.0 more-itertools==8.12.0 moviepy==0.2.3.5 mpmath==1.2.1 msgpack==1.0.3 multiprocess==0.70.12.2 multitasking==0.0.10 murmurhash==1.0.6 music21==5.5.0 mypy-extensions==0.4.3 natsort==5.5.0 nbclient==0.5.9 nbconvert==5.6.1 nbformat==5.1.3 nest-asyncio==1.5.4 netCDF4==1.5.8 networkx==2.6.3 nibabel==3.0.2 nltk==3.2.5 notebook==5.3.1 numba==0.51.2 numexpr==2.8.1 numpy==1.19.5 nvidia-ml-py3==7.352.0 oauth2client==4.1.3 oauthlib==3.1.1 okgrade==0.4.3 opencv-contrib-python==4.1.2.30 opencv-python==4.1.2.30 openpyxl==2.5.9 opt-einsum==3.3.0 osqp==0.6.2.post0 packaging==21.3 palettable==3.3.0 pandas==1.1.5 pandas-datareader==0.9.0 pandas-gbq==0.13.3 pandas-profiling==1.4.1 pandera @ git+https://github.com/pandera-dev/pandera.git@8de4953f1506a7a4cf607afcce36d6f48070329c pandocfilters==1.5.0 panel==0.12.1 param==1.12.0 parso==0.8.3 pathlib==1.0.1 patsy==0.5.2 pep517==0.12.0 pexpect==4.8.0 pickleshare==0.7.5 Pillow==7.1.2 pip-tools==6.2.0 plac==1.1.3 plotly==4.4.1 plotnine==0.6.0 pluggy==0.7.1 pooch==1.5.2 portpicker==1.3.9 prefetch-generator==1.0.1 preshed==3.0.6 prettytable==3.0.0 progressbar2==3.38.0 prometheus-client==0.12.0 promise==2.3 prompt-toolkit==1.0.18 protobuf==3.17.3 psutil==5.4.8 psycopg2==2.7.6.1 ptyprocess==0.7.0 py==1.11.0 pyarrow==3.0.0 pyasn1==0.4.8 pyasn1-modules==0.2.8 pycocotools==2.0.3 pycparser==2.21 pyct==0.4.8 pydata-google-auth==1.3.0 pydot==1.3.0 pydot-ng==2.0.0 pydotplus==2.0.2 PyDrive==1.3.1 pyemd==0.5.1 pyerfa==2.0.0.1 pyglet==1.5.0 Pygments==2.6.1 pygobject==3.26.1 pymc3==3.11.4 PyMeeus==0.5.11 pymongo==4.0.1 pymystem3==0.2.0 PyOpenGL==3.1.5 pyparsing==3.0.6 pyrsistent==0.18.0 pysndfile==1.3.8 PySocks==1.7.1 pystan==2.19.1.1 pytest==3.6.4 python-apt==0.0.0 python-chess==0.23.11 python-dateutil==2.8.2 python-louvain==0.15 python-slugify==5.0.2 python-utils==3.0.0 pytz==2018.9 pyviz-comms==2.1.0 PyWavelets==1.2.0 PyYAML==3.13 pyzmq==22.3.0 qdldl==0.1.5.post0 qtconsole==5.2.2 QtPy==2.0.0 regex==2019.12.20 requests==2.23.0 requests-oauthlib==1.3.0 resampy==0.2.2 retrying==1.3.3 rpy2==3.4.5 rsa==4.8 scikit-image==0.18.3 scikit-learn==1.0.2 scipy==1.4.1 screen-resolution-extra==0.0.0 scs==3.0.1 seaborn==0.11.2 semver==2.13.0 Send2Trash==1.8.0 setuptools-git==1.2 Shapely==1.8.0 simplegeneric==0.8.1 six==1.15.0 sklearn==0.0 sklearn-pandas==1.8.0 smart-open==5.2.1 snowballstemmer==2.2.0 sortedcontainers==2.4.0 SoundFile==0.10.3.post1 spacy==2.2.4 Sphinx==1.8.6 sphinxcontrib-serializinghtml==1.1.5 sphinxcontrib-websupport==1.2.4 SQLAlchemy==1.4.29 sqlparse==0.4.2 srsly==1.0.5 statsmodels==0.10.2 sympy==1.7.1 tables==3.4.4 tabulate==0.8.9 tblib==1.7.0 tensorboard==2.7.0 tensorboard-data-server==0.6.1 tensorboard-plugin-wit==1.8.1 tensorflow @ file:///tensorflow-2.7.0-cp37-cp37m-linux_x86_64.whl tensorflow-datasets==4.0.1 tensorflow-estimator==2.7.0 tensorflow-gcs-config==2.7.0 tensorflow-hub==0.12.0 tensorflow-io-gcs-filesystem==0.23.1 tensorflow-metadata==1.5.0 tensorflow-probability==0.15.0 termcolor==1.1.0 terminado==0.12.1 testpath==0.5.0 text-unidecode==1.3 textblob==0.15.3 Theano-PyMC==1.1.2 thinc==7.4.0 threadpoolctl==3.0.0 tifffile==2021.11.2 toml==0.10.2 tomli==2.0.0 toolz==0.11.2 torch @ https://download.pytorch.org/whl/cu111/torch-1.10.0%2Bcu111-cp37-cp37m-linux_x86_64.whl torchaudio @ https://download.pytorch.org/whl/cu111/torchaudio-0.10.0%2Bcu111-cp37-cp37m-linux_x86_64.whl torchsummary==1.5.1 torchtext==0.11.0 torchvision @ https://download.pytorch.org/whl/cu111/torchvision-0.11.1%2Bcu111-cp37-cp37m-linux_x86_64.whl tornado==5.1.1 tqdm==4.62.3 traitlets==5.1.1 tweepy==3.10.0 typeguard==2.7.1 typing-extensions==3.10.0.2 typing-inspect==0.7.1 tzlocal==1.5.1 uritemplate==3.0.1 urllib3==1.24.3 vega-datasets==0.9.0 wasabi==0.9.0 wcwidth==0.2.5 webencodings==0.5.1 Werkzeug==1.0.1 widgetsnbextension==3.5.2 wordcloud==1.5.0 wrapt==1.13.3 xarray==0.18.2 xgboost==0.90 xkit==0.0.0 xlrd==1.1.0 xlwt==1.3.0 yellowbrick==1.3.post1 zict==2.0.0 zipp==3.7.0 ```
cosmicBboy commented 2 years ago

thanks for finding this bug @alphavector ! #735 should fix it. It also adds 1.1.5 to CI. I've been avoiding adding more pandas versions to the CI matrix something came up... and it did :)

cosmicBboy commented 2 years ago

fixed by #735