ydataai / ydata-profiling

1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.
https://docs.profiling.ydata.ai
MIT License
12.53k stars 1.69k forks source link

pandas.Series.to_dict() got an unexpected keyword argument 'orient' #1529

Closed michellyrds closed 9 months ago

michellyrds commented 9 months ago

Current Behaviour

ProfileReport._render_json method tries to use function with keyword parameter only available in pd.DataFrame in a pd.Series, raising type error: https://github.com/ydataai/ydata-profiling/blob/cdfc17ac7c01a66a2f3bbf6641112149b1d83d90/src/ydata_profiling/profile_report.py#L453

https://pandas.pydata.org/docs/reference/api/pandas.Series.to_dict.html

    437                     return {encode_it(v) for v in o}
    438                 elif isinstance(o, (pd.DataFrame, pd.Series)):
--> 439                     return encode_it(o.to_dict(orient="records"))
    440                 elif isinstance(o, np.ndarray):
    441                     return encode_it(o.tolist())

TypeError: to_dict() got an unexpected keyword argument 'orient'

Expected Behaviour

A json format from a comparison report

Data Description

previous_dataset

previous_dataset = pd.DataFrame(data=[(1000, 42), (900, 30), (1500, 40), (1800, 38)], columns=["rent_per_month", "total_area"])

current_dataset

current_dataset = pd.DataFrame(data=[(5000, 350), (9000, 600), (5000, 400), (3500, 500), (6000, 600)], columns=["rent_per_month", "total_area"])

Code that reproduces the bug

import pandas as pd

from ydata_profiling import ProfileReport

previous_dataset = pd.DataFrame(data=[(1000, 42), (900, 30), (1500, 40), (1800, 38)], columns=["rent_per_month", "total_area"])
current_dataset = pd.DataFrame(data=[(5000, 350), (9000, 600), (5000, 400), (3500, 500), (6000, 600)], columns=["rent_per_month", "total_area"])
previous_dataset_report = ProfileReport(
    previous_dataset, title="Previous dataset report"
)
current_dataset_report = ProfileReport(
    current_dataset, title="Current dataset report"
)
comparison_report = previous_dataset_report.compare(current_dataset_report)
comparison_report.to_json()

pandas-profiling version

v4.5.1

Dependencies

aiobotocore==1.4.2
aiohttp==3.9.1
aioitertools==0.11.0
aiosignal==1.3.1
appdirs==1.4.4
argon2-cffi==20.1.0
async-generator==1.10
async-timeout==4.0.3
attrs==20.3.0
awscli==1.32.26
backcall==0.2.0
bidict==0.21.4
bleach==3.3.0
boto3==1.17.106
botocore==1.20.106
butterfree==1.2.3
cassandra-driver==3.24.0
certifi==2020.12.5
cffi==1.14.5
chardet==4.0.0
charset-normalizer==2.0.12
click==7.1.2
cmake==3.27.2
colorama==0.4.4
cycler==0.10.0
Cython==0.29.23
dacite==1.8.1
dbus-python==1.2.16
decorator==5.0.6
defusedxml==0.7.1
distlib==0.3.4
distro==1.4.0
distro-info==0.23+ubuntu1.1
docutils==0.16
entrypoints==0.3
facets-overview==1.0.0
filelock==3.6.0
frozenlist==1.4.1
fsspec==2021.8.1
geomet==0.2.1.post1
h3==3.7.6
hierarchical-conf==1.0.2
htmlmin==0.1.12
idna==2.10
ImageHash==4.3.1
ipykernel==5.3.4
ipython==7.22.0
ipython-genutils==0.2.0
ipywidgets==7.6.3
jedi==0.17.2
Jinja2==2.11.3
jmespath==0.10.0
joblib==1.0.1
jsonschema==3.2.0
jupyter-client==6.1.12
jupyter-core==4.7.1
jupyterlab-pygments==0.1.2
jupyterlab-widgets==1.0.0
kiwisolver==1.3.1
koalas==1.8.2
MarkupSafe==2.0.1
matplotlib==3.4.2
mdutils==1.6.0
mistune==0.8.4
multidict==6.0.4
multimethod==1.10
nbclient==0.5.3
nbconvert==6.0.7
nbformat==5.1.3
nest-asyncio==1.5.1
networkx==3.1
notebook==6.3.0
numpy==1.22.4
packaging==23.2
pandas==1.3.5
pandocfilters==1.4.3
parameters-validation==1.2.0
parso==0.7.0
patsy==0.5.6
pexpect==4.8.0
phik==0.12.4
pickleshare==0.7.5
Pillow==8.2.0
pip-resolved==0.3.0
plotly==5.5.0
prometheus-client==0.10.1
prompt-toolkit==3.0.17
protobuf==3.17.2
psycopg2==2.8.5
ptyprocess==0.7.0
py4j==0.10.9
pyarrow==13.0.0
pyarrow-hotfix==0.5
pyasn1==0.5.1
pycparser==2.20
pydantic==1.9.2
pydeequ==0.1.8
Pygments==2.8.1
PyGObject==3.36.0
pyparsing==2.4.7
pyrsistent==0.17.3
pyspark==3.0.2
python-apt==2.0.1+ubuntu0.20.4.1
python-dateutil==2.8.1
python-engineio==4.3.0
python-socketio==5.4.1
pytz==2023.3
PyWavelets==1.4.1
PyYAML==5.4.1
pyzmq==20.0.0
requests==2.26.0
requests-unixsocket==0.2.0
rsa==4.7.2
s3fs==2021.8.1
s3transfer==0.4.2
scikit-learn==0.24.1
scipy==1.10.1
seaborn==0.11.1
Send2Trash==1.5.0
six==1.15.0
ssh-import-id==5.10
statsmodels==0.14.1
tangled-up-in-unicode==0.2.0
tenacity==8.0.1
terminado==0.9.4
testpath==0.4.4
threadpoolctl==2.1.0
tornado==6.1
tqdm==4.66.1
traitlets==5.0.5
typeguard==2.13.3
typer==0.3.2
typing-extensions==4.0.1
unattended-upgrades==0.1
urllib3==1.26.16
virtualenv==20.4.1
visions==0.7.5
wcwidth==0.2.5
webencodings==0.5.1
widgetsnbextension==3.5.1
wordcloud==1.9.2
wrapt==1.16.0
yamale==4.0.2
yarl==1.9.4
ydata-profiling==4.5.1

OS

Ubuntu 20.04.4 LTS

Checklist

fabclmnt commented 9 months ago

Hey @michellyrds,

thank you for your feedback. We have added this issue to the next release.

alexbarros commented 9 months ago

closed by https://github.com/ydataai/ydata-profiling/pull/1538