ydataai / ydata-profiling

1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.
https://docs.profiling.ydata.ai
MIT License
12.37k stars 1.67k forks source link

Error while getting the json of a compare report #1555

Open ronfisher21 opened 6 months ago

ronfisher21 commented 6 months ago

Current Behaviour

Hi my code is pretty simple, i read 2 parquet file, created 2 reports for each pandas dataframes and used the compare method to generate a compare method. I tried to use the 'to_json()' method to convert my report to json and i got the following error: "TypeError: to_dict() got an unexpected keyword argument 'orient'"

I saw that you already resolved this issue in: fix: comparison to_json pd.Series encoding error #1538

I ungraded the package to the latest version and i still get the same error.

Expected Behaviour

I expected to convert my report to a json successfuly.

Data Description

The datasets i am using are confidential but the data format is parquet.

Code that reproduces the bug

import pandas as pd
from ydata_profiling import ProfileReport

df_ref = pd.read_parquet('dir/to/my/data/df_ref.parquet')
df_old = pd.read_parquet('dir/to/my/data/df_old.parquet')

ref_report = ProfileReport(df_ref, title='df ref report')
old_report = ProfileReport(df_old, title='df old report')

comparison_report = ref_report.compare(old_report)
comparison_report.to_json()

pandas-profiling version

v4.6.4

Dependencies

adagio==0.2.4

aiofiles==23.2.1

aiosignal==1.3.1

alabaster==0.7.13

alembic==1.12.0

altair==5.1.1

annotated-types==0.6.0

ansi2html==1.8.0

antlr4-python3-runtime==4.11.1

anyio==3.7.1

appdirs==1.4.4

argon2-cffi==23.1.0

argon2-cffi-bindings==21.2.0

ast_decompiler==0.7.0

astatine==0.3.3

astor==0.8.1

astpretty==3.0.0

astroid==2.15.8

asttokens==2.4.0

async-lru==2.0.4

attrs==23.1.0

autoflake==1.7.8

autoviz==0.1.730

aws-secretsmanager-caching==1.1.1.5

awscli==1.32.37

Babel==2.12.1

backcall==0.2.0

bandit==1.7.7

beautifulsoup4==4.12.2

black==22.12.0

bleach==6.0.0

bokeh==2.4.3

boto3==1.34.37

botocore==1.34.37

cachetools==5.3.2

catboost==1.2.1

category-encoders==2.6.2

certifi==2023.7.22

cffi==1.15.1

charset-normalizer==3.2.0

click==8.1.7

cloudpickle==2.2.1

cmaes==0.10.0

cognitive-complexity==1.3.0

colorama==0.4.4

colorcet==3.0.1

colorlog==6.7.0

colour==0.1.5

comm==0.1.4

contourpy==1.1.0

coverage==6.5.0

cryptography==41.0.3

cycler==0.11.0

Cython==3.0.2

daal==2023.2.1

daal4py==2023.2.1

dacite==1.8.1

darglint==1.8.1

dash==2.13.0

dash-auth==2.0.0

dash-bootstrap-components==1.4.2

dash-core-components==2.0.0

dash-cytoscape==0.3.0

dash-html-components==2.0.0

dash-table==5.0.0

dash-testing-stub==0.0.2

dask==2023.5.0

databricks-cli==0.17.7

debugpy==1.6.7.post1

decorator==5.1.1

deepchecks==0.17.4

defusedxml==0.7.1

deprecation==2.1.0

dill==0.3.7

distlib==0.3.8

distributed==2023.5.0

dlint==0.14.1

doc8==0.11.2

docformatter==1.7.5

docker==6.1.3

docutils==0.16

domdf-python-tools==3.8.0.post2

dtreeviz==2.2.2

eli5==0.13.0

emoji==2.8.0

entrypoints==0.4

eradicate==2.3.0

evidently==0.2.8

exceptiongroup==1.1.3

executing==1.2.0

explainerdashboard==0.4.3

fairlearn==0.7.0

fastapi==0.103.1

fastjsonschema==2.18.0

ffmpy==0.3.1

filelock==3.12.3

flake8==4.0.1

flake8-2020==1.6.1

flake8-aaa==0.17.0

flake8-annotations==2.9.1

flake8-annotations-complexity==0.0.8

flake8-annotations-coverage==0.0.6

flake8-bandit==3.0.0

flake8-black==0.3.6

flake8-blind-except==0.2.1

flake8-breakpoint==1.1.0

flake8-broken-line==0.4.0

flake8-bugbear==22.12.6

flake8-builtins==1.5.3

flake8-class-attributes-order==0.1.3

flake8-coding==1.3.2

flake8-cognitive-complexity==0.1.0

flake8-commas==2.1.0

flake8-comments==0.1.2

flake8-comprehensions==3.14.0

flake8-debugger==4.1.2

flake8-django==1.4

flake8-docstrings==1.7.0

flake8-encodings==0.5.1

flake8-eradicate==1.4.0

flake8-executable==2.1.3

flake8-expression-complexity==0.0.11

flake8-fixme==1.1.1

flake8-functions==0.0.8

flake8-functions-names==0.4.0

flake8-future-annotations==0.0.5

flake8-helper==0.2.2

flake8-isort==4.2.0

flake8-literal==1.4.0

flake8-logging-format==0.9.0

flake8-markdown==0.3.0

flake8-mutable==1.2.0

flake8-no-pep420==2.7.0

flake8-noqa==1.4.0

flake8-pie==0.16.0

flake8-plugin-utils==1.3.3

flake8-polyfill==1.0.2

flake8-pyi==22.11.0

flake8-pylint==0.2.1

flake8-pytest-style==1.7.2

flake8-quotes==3.3.2

flake8-rst-docstrings==0.2.7

flake8-secure-coding-standard==1.4.1

flake8-slots==0.1.6

flake8-string-format==0.3.0

flake8-tidy-imports==4.10.0

flake8-typing-imports==1.12.0

flake8-use-fstring==1.4

flake8-use-pathlib==0.3.0

flake8-useless-assert==0.4.4

flake8-variables-names==0.0.6

flake8-warnings==0.4.0

flake8_simplify==0.21.0

Flask==2.2.3

flask-simplelogin==0.1.2

Flask-WTF==1.1.1

fonttools==4.42.1

frozenlist==1.4.0

fs==2.4.16

fsspec==2023.9.0

fugue==0.8.6

fugue-sql-antlr==0.1.6

future==0.18.3

gevent==23.9.0.post1

gitdb==4.0.10

GitPython==3.1.34

gradio==3.42.0

gradio_client==0.5.0

graphviz==0.20.1

greenlet==2.0.2

grpcio==1.57.0

gunicorn==20.1.0

h11==0.14.0

holoviews==1.14.9

htmlmin==0.1.12

httpcore==0.17.3

httpx==0.24.1

huggingface-hub==0.16.4

hvplot==0.7.3

hyperopt==0.2.7

hypothesis==6.97.1

hypothesmith==0.1.9

idna==3.4

ImageHash==4.3.1

imageio==2.31.3

imagesize==1.4.1

imbalanced-learn==0.11.0

importlib-metadata==5.2.0

importlib-resources==6.0.1

iniconfig==2.0.0

interpret==0.4.4

interpret-core==0.4.4

ipykernel==6.25.2

ipython==7.34.0

ipython-genutils==0.2.0

ipywidgets==7.8.1

isort==5.13.2

itsdangerous==2.1.2

jedi==0.19.0

Jinja2==3.1.2

jmespath==1.0.1

joblib==1.3.2

json5==0.9.14

jsonpickle==3.0.2

jsonschema==4.19.0

jsonschema-specifications==2023.7.1

jupyter==1.0.0

jupyter-console==6.6.3

jupyter-dash==0.4.2

jupyter-events==0.7.0

jupyter-lsp==2.2.0

jupyter-server==1.24.0

jupyter_client==7.4.9

jupyter_core==5.3.1

jupyter_server_terminals==0.4.4

jupyterlab==4.0.5

jupyterlab-flake8==0.7.1

jupyterlab-pygments==0.2.2

jupyterlab-widgets==1.1.7

jupyterlab_server==2.24.0

kaleido==0.2.1

kiwisolver==1.4.5

kmodes==0.12.2

lark-parser==0.12.0

lazy-object-proxy==1.10.0

lazy_loader==0.3

libcst==0.4.10

lightgbm==4.1.0

lime==0.2.0.1

linkify-it-py==2.0.2

llvmlite==0.40.1

locket==1.0.0

lxml==4.9.3

m2cgen==0.10.0

Mako==1.2.4

Markdown==3.4.4

markdown-it-py==3.0.0

MarkupSafe==2.1.3

matplotlib==3.7.2

matplotlib-inline==0.1.6

mccabe==0.6.1

mdit-py-plugins==0.4.0

mdurl==0.1.2

mistune==3.0.1

mlflow==1.30.1

mlxtend==0.22.0

moto==4.2.2

mr-proper==0.0.7

msgpack==1.0.5

multimethod==1.9.1

multiprocess==0.70.15

mypy-extensions==1.0.0

natsort==8.4.0

nbclassic==1.0.0

nbclient==0.8.0

nbconvert==7.8.0

nbformat==5.9.2

nest-asyncio==1.5.7

networkx==3.1

nltk==3.8.1

notebook==6.5.6

notebook_shim==0.2.3

numba==0.57.1

numpy==1.23.5

nvidia-ml-py==12.535.133

nvitop==1.3.2

oauthlib==3.2.2

optuna==3.3.0

orjson==3.9.5

outcome==1.2.0

overrides==7.4.0

oyaml==1.0

packaging==21.3

pandas==2.0.3

pandas-dq==1.28

pandas-vet==0.2.3

pandocfilters==1.5.0

panel==0.14.4

param==1.13.0

parso==0.8.3

partd==1.4.1

pathspec==0.9.0

patsy==0.5.3

pbr==6.0.0

pep8-naming==0.12.1

percy==2.0.2

pexpect==4.8.0

phik==0.12.3

pickleshare==0.7.5

Pillow==10.0.0

pkg_resources==0.0.0

pkgutil_resolve_name==1.3.10

platformdirs==3.10.0

plotly==5.16.1

plotly-resampler==0.9.1

pluggy==1.3.0

pmdarima==2.0.3

polars==0.19.2

prometheus-client==0.17.1

prometheus-flask-exporter==0.22.4

prompt-toolkit==3.0.39

protobuf==4.24.2

psutil==5.9.5

psycopg2-binary==2.9.9

ptyprocess==0.7.0

pure-eval==0.2.2

py==1.11.0

py4j==0.10.9.7

pyamg==5.0.1

pyaml==23.9.2

pyarrow==13.0.0

pyasn1==0.5.0

pybetter==0.4.1

pycaret==3.0.4

pycln==1.3.5

pycodestyle==2.8.0

pycparser==2.21

pyct==0.5.0

pydantic==2.6.2

pydantic-settings==2.1.0

pydantic_core==2.16.3

pydocstyle==6.3.0

pydub==0.25.1

pyemojify==0.2.0

pyflakes==2.4.0

Pygments==2.16.1

PyJWT==2.8.0

pylint==2.17.7

PyMySQL==1.1.0

PyNaCl==1.5.0

pynndescent==0.5.10

PyNomaly==0.3.3

pyod==1.1.0

pyOpenSSL==23.2.0

pyparsing==3.0.9

PySocks==1.7.1

pytest==7.4.1

pytest-cov==3.0.0

pytest-sugar==0.9.7

python-dateutil==2.8.2

python-dev-tools==2022.5.27

python-dotenv==1.0.1

python-json-logger==2.0.7

python-multipart==0.0.6

python-utils==3.7.0

pytz==2022.7.1

pyupgrade==2.38.4

pyviz_comms==3.0.0

PyWavelets==1.4.1

PyYAML==6.0.1

pyzmq==23.2.1

qpd==0.4.4

qtconsole==5.4.4

QtPy==2.4.0

querystring-parser==1.2.4

ray==2.6.3

referencing==0.30.2

regex==2023.8.8

removestar==1.5

requests==2.31.0

responses==0.23.3

restructuredtext-lint==1.4.0

retrying==1.3.4

rfc3339-validator==0.1.4

rfc3986-validator==0.1.1

rich==13.7.0

rpds-py==0.10.2

rsa==4.7.2

s3transfer==0.10.0

SALib==1.4.7

schemdraw==0.15

scikit-base==0.5.1

scikit-image==0.21.0

scikit-learn==1.2.2

scikit-learn-intelex==2023.2.1

scikit-optimize==0.9.0

scikit-plot==0.3.7

scipy==1.10.1

seaborn==0.12.2

selenium==4.2.0

semantic-version==2.10.0

Send2Trash==1.8.2

setuptools-scm==7.1.0

shap==0.42.1

six==1.16.0

skope-rules==1.0.1

sktime==0.22.0

slicer==0.0.7

smmap==5.0.0

sniffio==1.3.0

snowballstemmer==2.2.0

sortedcontainers==2.4.0

soupsieve==2.5

Sphinx==4.5.0

sphinxcontrib-applehelp==1.0.4

sphinxcontrib-devhelp==1.0.2

sphinxcontrib-htmlhelp==2.0.1

sphinxcontrib-jsmath==1.0.1

sphinxcontrib-qthelp==1.0.3

sphinxcontrib-serializinghtml==1.1.5

SQLAlchemy==1.4.49

sqlglot==18.2.0

sqlparse==0.4.4

ssort==0.12.3

stack-data==0.6.2

starlette==0.27.0

statsforecast==1.6.0

statsmodels==0.14.0

stdlib-list==0.10.0

stevedore==5.1.0

tabulate==0.9.0

tangled-up-in-unicode==0.2.0

tbats==1.1.3

tbb==2021.10.0

tblib==2.0.0

tenacity==8.2.3

tensorboardX==2.6.2.2

termcolor==2.4.0

terminado==0.17.1

textblob==0.17.1

threadpoolctl==3.2.0

tifffile==2023.7.10

tinycss2==1.2.1

tokenize-rt==4.2.1

toml==0.10.2

tomli==2.0.1

tomlkit==0.12.3

toolz==0.12.0

tornado==6.3.3

tox==3.28.0

tox-travis==0.13

tqdm==4.66.1

trace-updater==0.0.9.1

traitlets==5.9.0

treeinterpreter==0.2.3

triad==0.9.1

trio==0.22.2

trio-websocket==0.10.3

tsdownsample==0.1.2

tune-sklearn==0.4.6

typeguard==4.1.5

typer==0.4.2

types-PyYAML==6.0.12.11

typing-inspect==0.9.0

typing_extensions==4.7.1

tzdata==2024.1

uc-micro-py==1.0.2

umap-learn==0.5.3

untokenize==0.1.1

urllib3==1.26.16

urllib3-secure-extra==0.1.0

uvicorn==0.23.2

virtualenv==20.25.0

virtualenv-clone==0.5.7

visions==0.7.5

waitress==2.1.2

wcwidth==0.2.6

webencodings==0.5.1

websocket-client==1.6.2

websockets==11.0.3

wemake-python-styleguide==0.16.1

Werkzeug==2.2.3

widgetsnbextension==3.6.6

wordcloud==1.9.2

wrapt==1.16.0

wsproto==1.2.0

WTForms==3.0.1

wurlitzer==3.0.3

xgboost==1.7.6

xlrd==2.0.1

xmltodict==0.13.0

xxhash==3.3.0

xyzservices==2023.7.0

ydata-profiling==4.6.4

yellowbrick==1.5

zict==3.0.0

zipp==3.16.2

zope.event==5.0

zope.interface==6.0

OS

ubuntu

Checklist

fabclmnt commented 5 months ago

Hi @ronfisher21 ,

can you please test with the latest version of the package? We have just double checked, and we are able to extract the information of the compare report with no errors.

Example of the code used: `import pandas as pd from ydata_profiling import ProfileReport

og_df = pd.read_csv('sample_data/california_housing_train.csv') df = pd.read_csv('sample_data/california_housing_test.csv')

report = ProfileReport(og_df, title='Train dataset houses') report_test = ProfileReport(df, title='Test dataset houses')

compare = report.compare(report_test)

using a variable to store the Json output

compare_json=compare.to_json()

storing the json output as a file

compare.to_file('compare.json')`

Attach you can see the json.

compare.json