ydataai / ydata-profiling

1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.
https://docs.profiling.ydata.ai
MIT License
12.55k stars 1.69k forks source link

Generating ProfileReport from pickle files does not work (methods ProfileReport.dumps() and ProfileReport.loads()) #1033

Open Tuxedo94 opened 2 years ago

Tuxedo94 commented 2 years ago

Current Behaviour

Expected Behaviour

Expected a ProfileReport to be generated from loaded pickle file when invoking profile.to_file('report.html'). I expecte it to be possible to generate the ProfileReport from the loaded pickle file report without the original pd.DataFrame.

Data Description

df = pd.DataFrame([['a', 'b'], ['c', 'd']], index=['row 1', 'row 2'], columns=['col 1', 'col 2'])

Code that reproduces the bug

import pandas as pd
from pandas_profiling import ProfileReport
import pickle

# create dummy data as pd.DataFrame
df = pd.DataFrame([['a', 'b'], ['c', 'd']],
                index=['row 1', 'row 2'],
                columns=['col 1', 'col 2'])
# generate report from pd.DataFrame
profile = ProfileReport(df)
# save report locally as pickle file
_bytes = profile.dumps()
with open('report.pickle', 'wb') as f:
    pickle.dump(_bytes, f, pickle.HIGHEST_PROTOCOL)
# create empty report object
profile = ProfileReport()
# load report pickle file from local
data = pickle.load(open('report.pickle', "rb"))
profile.loads(data)
#! generate report from loaded pickle file
profile.to_file('report.html')

pandas-profiling version

v3.2.0

Dependencies

pandas==1.3.5
pandas-profiling==3.2.0

OS

Windows 10

Checklist

fabclmnt commented 1 year ago

Hi @Tuxedo94,

I think I understood the challenge. Prior to the dump of the ProfileReport you need to trigger the computation of the report itself (calling a method such as _tofile). Let me know if this solves your issue.