Open skunkyevil opened 1 year ago
I have the same issue with pandas 1.4.3 and ydata-profiling 1.4.4 and numpy 1.23.5. Funnily enough I only get the same error when I filter some columns from my original dataset.
Adding a column with unique values for each row solved my problem. This will obviously not allow the profiling to find duplicate rows, but its better than not being able to get the report at all.
It probably has something to do the with the numpy.histogram bug and floats https://stackoverflow.com/questions/67342168/memoryerror-when-using-pandas-profiling-profile-report
Actually this is indeed a np.histogram bug, the following code gives the same error with the same dataframe:
import numpy as np
buggy_df = pd.read_csv('buggy_df.csv')
a = buggy_df['str3d_net_profit_long'].dropna().to_numpy()
np.histogram(a, bins='auto')
Hi @skunkyevil ,
from what you have reported you are using an older version of the package. We are currently in version 4.5.0 for ydata-profiling. Can you please let me know if the error remains?
Also can you please provide more details about your dataset, so we can have a better understanding?
Cheers.
Hi @fabclmnt ,
Just checked with ydata-profiling version 4.5.0 - error still persist. I've included dataset in my initial post: buggy_df.csv
I think this issue thread does not deserve an effort to be considered separately instead of solving more general np.histogram
bug. The only difference from general case is that my dataset does not contain huge outlies at all.
Current Behaviour
I came across very weird MemoryError when trying to build profile on particular dataframe:
Error traceback is rather long, click to expand
``` MemoryError Traceback (most recent call last) ~\AppData\Local\Temp/ipykernel_2680/3177531553.py inExpected Behaviour
It should generate a report
Data Description
Here is dataframe that caused the bug exported to csv: buggy_df.csv
I couldn't make it smaller, even splitting it into 2 parts result in normal processing for each part without errors
Code that reproduces the bug
pandas-profiling version
v3.6.6
Dependencies
OS
Windows 10
Checklist