importing
reading data
loading as csv
generating report: './main/inputs.ignore.report.html'
Summarize dataset: 0%| | 0/5 [00:00<?, ?it/s]
zsh: killed ydata ./main/inputs.ignore.csv
/opt/homebrew/Cellar/python@3.11/3.11.3/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 3 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
Expected Behaviour
Generate the html report
Data Description
Here's the contents of the CSV.
NOTE1: Removing even 1 row no longer causes the freeze/hangup
NOTE2: Despite the "fragility", the behavior is consistent. E.g. it always works with 1 row removed, and always hangs when all rows are present
data
1.8979166666666665
1.8770833333333332
1696285500.0
1.8
1.8010416666666667
1.8114583333333334
Code that reproduces the bug
#!/usr/bin/env python3
print(f'''importing''')
import numpy as np
import pandas as pd
from ydata_profiling import ProfileReport
import pandas as pd
from io import StringIO
import sys
import os
# pip install ydata-profiling
print(f'''reading data''')
filepath = sys.argv[1]
with open(filepath,'r') as f:
output = f.read()
kwargs = dict(sep=",")
if output.startswith("#"):
kwargs["comment"] = "#"
if output.count('\t') > output.count(','):
kwargs["sep"] = "\t"
# Use StringIO to create a file-like object from the string
print(f'''loading as csv''')
df = pd.read_csv(StringIO(output))
profile = ProfileReport(df, title="Profiling Report")
new_path_base = os.path.dirname(filepath)
basename = os.path.basename(filepath)
if "." not in basename:
new_path_base += f"/{basename}"
else:
new_path_base += "/" + ".".join(basename.split(".")[0:-1])
report_path = f"{new_path_base}.report.html"
print(f'''generating report: {repr(report_path)}''')
profile.to_file(report_path)
Current Behaviour
Reading a 1 column, 7 row file causes a total lock up. (Happens with a bigger file, but I shrunk it down)
I think this could be different from this issue and this issue
Here is the CLI output:
Expected Behaviour
Generate the html report
Data Description
Here's the contents of the CSV.
NOTE1: Removing even 1 row no longer causes the freeze/hangup
NOTE2: Despite the "fragility", the behavior is consistent. E.g. it always works with 1 row removed, and always hangs when all rows are present
Code that reproduces the bug
pandas-profiling version
v4.6.4
Dependencies
OS
MacOS 12.6 (Monterey) Apple Silicon
Checklist