ydataai / ydata-profiling

1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.
https://docs.profiling.ydata.ai
MIT License
12.58k stars 1.69k forks source link

ValueError: Only supported for TrueType fontsBug Report #1355

Open BnBear123 opened 1 year ago

BnBear123 commented 1 year ago

Current Behaviour

I just run the Example "[NASA Meteorites](comprehensive set of meteorite landing - object properties and locations)" in Colab

ValueError Traceback (most recent call last) /usr/local/lib/python3.10/dist-packages/IPython/core/formatters.py in call(self, obj) 343 method = get_real_method(obj, self.print_method) 344 if method is not None: --> 345 return method() 346 return None 347 else:

22 frames /usr/local/lib/python3.10/dist-packages/PIL/ImageDraw.py in textbbox(self, xy, text, font, anchor, spacing, align, direction, features, language, stroke_width, embedded_color) 649 font = self.getfont() 650 if not isinstance(font, ImageFont.FreeTypeFont): --> 651 raise ValueError("Only supported for TrueType fonts") 652 mode = "RGBA" if embedded_color else self.fontmode 653 bbox = font.getbbox(

ValueError: Only supported for TrueType fonts

Expected Behaviour

the report contained

Data Description

https://data.nasa.gov/Space-Science/Meteorite-Landings/gh4g-9sfh

Code that reproduces the bug

!pip install -U pandas-profiling

import numpy as np
import pandas as pd

import ydata_profiling
from ydata_profiling.utils.cache import cache_file

file_name = cache_file(
    "meteorites.csv",
    "https://data.nasa.gov/api/views/gh4g-9sfh/rows.csv?accessType=DOWNLOAD",
)

df = pd.read_csv(file_name)

# Note: Pandas does not support dates before 1880, so we ignore these for this analysis
df["year"] = pd.to_datetime(df["year"], errors="coerce")

# Example: Constant variable
df["source"] = "NASA"

# Example: Boolean variable
df["boolean"] = np.random.choice([True, False], df.shape[0])

# Example: Mixed with base types
df["mixed"] = np.random.choice([1, "A"], df.shape[0])

# Example: Highly correlated variables
df["reclat_city"] = df["reclat"] + np.random.normal(scale=5, size=(len(df)))

# Example: Duplicate observations
duplicates_to_add = pd.DataFrame(df.iloc[0:10])
duplicates_to_add["name"] = duplicates_to_add["name"] + " copy"

df = df.append(duplicates_to_add, ignore_index=True)

report = df.profile_report(
    sort=None, html={"style": {"full_width": True}}, progress_bar=False
)
report    # <-- this produces the error

pandas-profiling version

v3.6.6

Dependencies

pandas==1.5.3
ydata_profiling==v4.2.0

OS

Colab

Checklist

phamthaihoangtung commented 1 year ago

I also meet this issue with ydata_profiling==v4.2.0.

huaji1992 commented 1 year ago

+1

huaji1992 commented 1 year ago

https://colab.research.google.com/github/ydataai/pandas-profiling/blob/master/examples/meteorites/meteorites_cloud.ipynb

lesliecamedics commented 1 year ago

The issue you're encountering is with the WordCloud library, but fortunately, it has a simple solution. By executing the command pip install --upgrade pip and pip install --upgrade Pillow to make sure that you have the latest version of the Pillow library, which WordCloud relies on for its operations.

These steps should help to resolve the problem as described in the following GitHub thread: https://github.com/amueller/word_cloud/issues/729.

BnBear123 commented 11 months ago

I think change code that install pip before !pip install -U pandas-profiling after !pip install -U ydata-profiling

after that I can run and get result https://colab.research.google.com/github/ydataai/pandas-profiling/blob/master/examples/meteorites/meteorites_cloud.ipynb

coyne44 commented 5 months ago

Thanks @BnBear123 , that also worked for me!