ydataai / ydata-profiling

1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.
https://docs.profiling.ydata.ai
MIT License
12.36k stars 1.67k forks source link

adding config_file = 'config_default.yaml' (without changing the config) results in less warnings #827

Open DanilZherebtsov opened 2 years ago

DanilZherebtsov commented 2 years ago

Describe the bug

When adding an argument to the ProfileReport without changing the contents of config_default.yaml, the resulting report produces much less warnings.

P.S. I want to change the config_default.yaml to exclude irrelevant information, but I had noticed that even without changing the config and just adding a path to original config as an argument changes the resulting report.

When not specifying the argument to config_file - the report produces 33 warnings When specifying the argument to the untouched config_default.yaml - the report produces 14 warnings

To Reproduce Data: Used titanic train dataset downloaded from Kaggle: https://www.kaggle.com/c/titanic/data?select=train.csv

Code:

import pandas as pd
from pandas_profiling import ProfileReport

def test_issueXXX():
    df = pd.read_csv(r"<file>")

    report = ProfileReport(df, minimal = False, 
                           progress_bar = False,
                           config_file="/Users/user/Desktop/config_default.yaml")
**Version information:** * _Python version_: 3.7.6 * _Environment_: Spyder * _`pip`_: * absl-py==0.12.0 adanet==0.9.0 alabaster==0.7.12 altair==4.1.0 anaconda-client==1.7.2 anaconda-navigator==1.9.12 anyio @ file:///opt/concourse/worker/volumes/live/4ff95164-2ca6-4efb-5c02-5afee907620d/volume/anyio_1617783322708/work/dist appdirs==1.4.4 appnope @ file:///opt/concourse/worker/volumes/live/4f734db2-9ca8-4d8b-5b29-6ca15b4b4772/volume/appnope_1606859466979/work argon2-cffi @ file:///opt/concourse/worker/volumes/live/4afd07c8-7fc3-4a09-6326-d8c70269eb33/volume/argon2-cffi_1613037490059/work astor==0.8.1 astunparse==1.6.3 async-generator==1.10 attrs @ file:///tmp/build/80754af9/attrs_1604765588209/work Babel @ file:///tmp/build/80754af9/babel_1607110387436/work backcall @ file:///home/ktietz/src/ci/backcall_1611930011877/work backports.functools-lru-cache @ file:///tmp/build/80754af9/backports.functools_lru_cache_1618170165463/work backports.tempfile @ file:///home/linux1/recipes/ci/backports.tempfile_1610991236607/work backports.weakref==1.0.post1 backports.zoneinfo==0.2.1 base58==2.1.0 beautifulsoup4 @ file:///home/linux1/recipes/ci/beautifulsoup4_1610988766420/work bleach @ file:///tmp/build/80754af9/bleach_1612211392645/work blinker==1.4 blis==0.7.4 boto3==1.17.74 botocore==1.20.74 Bottleneck==1.3.2 Brotli==1.0.9 brotlipy==0.7.0 cached-property==1.5.2 cachetools==4.2.1 catalogue==2.0.6 catalyst==20.12 certifi==2020.12.5 cffi==1.14.0 chardet @ file:///opt/concourse/worker/volumes/live/9efbf151-b45b-463d-6340-a5c399bf00b7/volume/chardet_1607706825988/work click @ file:///home/linux1/recipes/ci/click_1610990599742/work cloudpickle @ file:///tmp/build/80754af9/cloudpickle_1598884132938/work clyent==1.2.2 colorama==0.4.4 combo==0.1.2 commonmark==0.9.1 conda==4.10.1 conda-build==3.18.11 conda-package-handling @ file:///opt/concourse/worker/volumes/live/d106838d-eaa7-40fd-5437-9d95a7db5458/volume/conda-package-handling_1618262135990/work conda-verify==3.4.2 coverage==4.5.4 cryptography @ file:///opt/concourse/worker/volumes/live/cdf8ff17-b0bf-4081-524c-b5a0afe929ba/volume/cryptography_1616769280208/work cycler==0.10.0 cymem==2.0.5 dash==1.20.0 dash-core-components==1.16.0 dash-html-components==1.1.3 dash-renderer==1.9.1 dash-table==4.11.3 decorator @ file:///tmp/build/80754af9/decorator_1617916966915/work defusedxml @ file:///tmp/build/80754af9/defusedxml_1615228127516/work distlib==0.3.1 dm-tree==0.1.6 docutils==0.17.1 entrypoints==0.3 et-xmlfile==1.1.0 fastai==2.5.2 fastcore==1.3.26 fastdownload==0.0.5 fastprogress==1.0.0 filelock @ file:///home/linux1/recipes/ci/filelock_1610993975404/work Flask==2.0.1 Flask-Compress==1.10.1 flatbuffers==1.12 future==0.18.2 gast==0.4.0 gitdb==4.0.7 GitPython==3.1.18 glob2 @ file:///home/linux1/recipes/ci/glob2_1610991677669/work google-auth==1.29.0 google-auth-oauthlib==0.4.4 google-pasta==0.2.0 grpcio==1.34.1 h5py==3.1.0 htmlmin==0.1.12 idna @ file:///home/linux1/recipes/ci/idna_1610986105248/work ImageHash==4.2.1 imagesize==1.2.0 importlib-metadata @ file:///opt/concourse/worker/volumes/live/4e25a0be-45cc-4b73-4314-4604f00e30a4/volume/importlib-metadata_1617877365451/work ipykernel @ file:///opt/concourse/worker/volumes/live/73e8766c-12c3-4f76-62a6-3dea9a7da5b7/volume/ipykernel_1596206701501/work/dist/ipykernel-5.3.4-py3-none-any.whl ipython @ file:///opt/concourse/worker/volumes/live/f33fa11b-d908-43e8-693a-f4b58d8c695c/volume/ipython_1617120878195/work ipython-genutils @ file:///tmp/build/80754af9/ipython_genutils_1606773439826/work ipywidgets @ file:///tmp/build/80754af9/ipywidgets_1610481889018/work itsdangerous==2.0.1 jedi==0.17.0 Jinja2==3.0.1 jmespath==0.10.0 joblib==1.0.1 json5==0.9.5 jsonschema @ file:///tmp/build/80754af9/jsonschema_1602607155483/work jupyter-client @ file:///tmp/build/80754af9/jupyter_client_1616770841739/work jupyter-core @ file:///opt/concourse/worker/volumes/live/a699b83f-e941-4170-5136-bf87e3f37756/volume/jupyter_core_1612213304212/work jupyter-packaging @ file:///tmp/build/80754af9/jupyter-packaging_1613502826984/work jupyter-server @ file:///opt/concourse/worker/volumes/live/8c909028-d80d-4303-5883-52f907a3cf74/volume/jupyter_server_1616084052708/work jupyterlab @ file:///tmp/build/80754af9/jupyterlab_1619133235951/work jupyterlab-pygments @ file:///tmp/build/80754af9/jupyterlab_pygments_1601490720602/work jupyterlab-server @ file:///tmp/build/80754af9/jupyterlab_server_1617134334258/work jupyterlab-widgets @ file:///tmp/build/80754af9/jupyterlab_widgets_1609884341231/work kaggle==1.5.12 Keras==2.4.3 keras-nightly==2.5.0.dev2021032900 Keras-Preprocessing==1.1.2 keyring==23.0.1 kiwisolver==1.3.1 libarchive-c @ file:///tmp/build/80754af9/python-libarchive-c_1617780486945/work llvmlite==0.36.0 Markdown==3.3.4 MarkupSafe==2.0.1 matplotlib==3.3.4 missingno==0.5.0 mistune==0.8.4 mlxtend==0.18.0 mock==3.0.5 modin==0.9.1 multimethod==1.4 murmurhash==1.0.5 navigator-updater==0.2.1 nbclassic @ file:///tmp/build/80754af9/nbclassic_1616085367084/work nbclient @ file:///tmp/build/80754af9/nbclient_1614364831625/work nbconvert @ file:///opt/concourse/worker/volumes/live/d4b0787b-b6c8-4d28-5453-3381885d5b33/volume/nbconvert_1601914848300/work nbformat @ file:///tmp/build/80754af9/nbformat_1617383369282/work nest-asyncio @ file:///tmp/build/80754af9/nest-asyncio_1613680548246/work networkx==2.6.2 nltk==3.4.5 nose==1.3.7 notebook @ file:///opt/concourse/worker/volumes/live/cc183aca-5b6b-47b3-7a6f-532a4cbcfbc1/volume/notebook_1616443452879/work numba==0.53.1 numexpr==2.7.3 numpy==1.19.5 oauthlib==3.1.0 olefile==0.46 openpyxl==3.0.7 opt-einsum==3.3.0 packaging @ file:///tmp/build/80754af9/packaging_1611952188834/work pandas==1.3.0 pandas-profiling==3.0.0 pandocfilters @ file:///opt/concourse/worker/volumes/live/315ac9bb-93fd-4adc-6795-345692fcfaed/volume/pandocfilters_1605120446899/work parso @ file:///tmp/build/80754af9/parso_1617223946239/work pathy==0.6.0 patsy==0.5.1 permetrics==1.1.3 pexpect @ file:///tmp/build/80754af9/pexpect_1605563209008/work phik==0.12.0 pickleshare @ file:///tmp/build/80754af9/pickleshare_1606932040724/work Pillow==7.1.1 pkginfo==1.7.0 plotly==4.14.3 preshed==3.0.5 prometheus-client @ file:///tmp/build/80754af9/prometheus_client_1618088486455/work prompt-toolkit @ file:///tmp/build/80754af9/prompt-toolkit_1616415428029/work protobuf==3.15.8 psutil @ file:///opt/concourse/worker/volumes/live/8e01e0e9-ea07-4efa-7afb-fae37c1b9faa/volume/psutil_1612298009056/work ptyprocess @ file:///tmp/build/80754af9/ptyprocess_1609355006118/work/dist/ptyprocess-0.7.0-py2.py3-none-any.whl pyarrow==4.0.1 pyasn1==0.4.8 pyasn1-modules==0.2.8 pycosat==0.6.3 pycparser @ file:///tmp/build/80754af9/pycparser_1594388511720/work pydantic==1.8.2 pydeck==0.6.2 Pygments @ file:///tmp/build/80754af9/pygments_1615143339740/work pyhealth==0.0.6 pymystem3==0.2.0 pyod==0.8.9 pyOpenSSL @ file:///tmp/build/80754af9/pyopenssl_1608057966937/work pyparsing @ file:///home/linux1/recipes/ci/pyparsing_1610983426697/work pyrsistent @ file:///opt/concourse/worker/volumes/live/656e0c1b-ef87-4251-4a51-1290b2351993/volume/pyrsistent_1600141745371/work PySocks @ file:///opt/concourse/worker/volumes/live/ef943889-94fc-4539-798d-461c60b77804/volume/pysocks_1605305801690/work python-dateutil @ file:///home/ktietz/src/ci/python-dateutil_1611928101742/work python-slugify==5.0.2 pytorch-pretrained-bert==0.6.2 pytz @ file:///tmp/build/80754af9/pytz_1612215392582/work PyWavelets==1.1.1 pywebio==1.2.3 PyYAML==5.4.1 pyzmq==20.0.0 QtPy==1.9.0 readme-renderer==29.0 recommonmark==0.7.1 rednose==1.3.0 regex==2021.4.4 requests @ file:///tmp/build/80754af9/requests_1608241421344/work requests-oauthlib==1.3.0 requests-toolbelt==0.9.1 retrying==1.3.3 rfc3986==1.5.0 rsa==4.7.2 ruamel-yaml-conda @ file:///opt/concourse/worker/volumes/live/da6f10aa-e617-4894-45a9-cfdf5da681c3/volume/ruamel_yaml_1616016690897/work s3transfer==0.4.2 sacremoses==0.0.45 scikit-learn==0.22.1 scipy==1.7.1 seaborn==0.11.1 Send2Trash @ file:///tmp/build/80754af9/send2trash_1607525499227/work sentencepiece==0.1.96 Shapely==1.7.1 shapley==1.0.1 six @ file:///opt/concourse/worker/volumes/live/f983ba11-c9fe-4dff-7ce7-d89b95b09771/volume/six_1605205318156/work sklearn==0.0 smart-open==5.2.1 smmap==4.0.0 sniffio @ file:///opt/concourse/worker/volumes/live/838fa2d9-a35b-4591-50ce-1f1a39baa1df/volume/sniffio_1614030463440/work snowballstemmer==2.1.0 soupsieve @ file:///tmp/build/80754af9/soupsieve_1616183228191/work spacy==3.1.2 spacy-legacy==3.0.8 Sphinx==4.1.2 sphinxcontrib-applehelp==1.0.2 sphinxcontrib-devhelp==1.0.2 sphinxcontrib-htmlhelp==2.0.0 sphinxcontrib-jsmath==1.0.1 sphinxcontrib-qthelp==1.0.3 sphinxcontrib-serializinghtml==1.1.5 spyder-kernels @ file:///opt/concourse/worker/volumes/live/12c4a3a1-2064-489a-6dba-a41b20e966e5/volume/spyder-kernels_1617396556108/work srsly==2.4.1 statsmodels==0.12.2 streamlit==0.86.0 tables==3.6.1 tangled-up-in-unicode==0.1.0 tensorboard==2.5.0 tensorboard-data-server==0.6.0 tensorboard-plugin-wit==1.8.0 tensorboardX==2.2 tensorflow==2.5.0 tensorflow-estimator==2.5.0 tensorflow-probability==0.13.0 termcolor==1.1.0 terminado==0.9.4 termstyle==0.1.11 testpath @ file:///home/ktietz/src/ci/testpath_1611930608132/work text-unidecode==1.3 textblob==0.15.3 thinc==8.0.10 threadpoolctl==2.2.0 tokenizers==0.8.1rc2 toml==0.10.2 toolz==0.11.1 torch==1.9.0 torchvision==0.10.0 tornado @ file:///opt/concourse/worker/volumes/live/d531d395-893c-4ca1-6a5f-717b318eb08c/volume/tornado_1606942307627/work tqdm @ file:///tmp/build/80754af9/tqdm_1615925068909/work traitlets @ file:///home/ktietz/src/ci/traitlets_1611929699868/work transformers==3.3.1 twine==3.4.2 typer==0.3.2 typing-extensions @ file:///home/ktietz/src/ci_mi/typing_extensions_1612808209620/work tzlocal==3.0 ua-parser==0.10.0 urllib3 @ file:///tmp/build/80754af9/urllib3_1615837158687/work user-agents==2.2.0 validators==0.18.2 verstack==0.4.4 virtualenv==20.4.4 visions==0.7.1 wasabi==0.8.2 wcwidth @ file:///tmp/build/80754af9/wcwidth_1593447189090/work webencodings==0.5.1 Werkzeug==2.0.1 widgetsnbextension==3.5.1 wordcloud==1.5.0 wrapt==1.12.1 wurlitzer @ file:///opt/concourse/worker/volumes/live/a07f2ad6-5a18-4b19-78a6-11eea045a34d/volume/wurlitzer_1617224647004/work xgboost==1.1.0 xlrd==1.2.0 xmltodict==0.12.0 zipfile36==0.1.3 zipp @ file:///tmp/build/80754af9/zipp_1615904174917/work ``` <<< Put your version information here >>> ```

-->

Additional context

sbrugman commented 2 years ago

Any contribution on reducing the warnings is welcome. I'd suggest sharing the relevant warnings here.

DanilZherebtsov commented 2 years ago

Oh, sorry, I mean Warnings that were created by the Profile Report. Please check the screenshot below

DanilZherebtsov commented 2 years ago
image