ydataai / ydata-profiling

1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.
https://docs.profiling.ydata.ai
MIT License
12.39k stars 1.67k forks source link

importing ProfileReport seems to interfere with matplotlib plotting in jupyter notebook #837 #888

Open jodom961 opened 2 years ago

jodom961 commented 2 years ago

Describe the bug

if you import matplotlib in jupyter notebook and run

import numpy as np
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt

# Silly example data
bp_x = np.linspace(0, 2*np.pi, num=40, endpoint=True)
bp_y = np.sin(bp_x)

# Make the plot
plt.plot(bp_x, bp_y, linewidth=3, linestyle="--",
         color="blue", label=r"Legend label $\sin(x)$")
plt.xlabel(r"Description of $x$ coordinate (units)")
plt.ylabel(r"Description of $y$ coordinate (units)")
plt.title(r"Title here (remove for papers)")
plt.xlim(0, 2*np.pi)
plt.ylim(-1.1, 1.1)
plt.legend(loc="lower left")
plt.show()

You get a plot.

if you import pandas profiling > 3.0.0 ( for example 3.1.0) it breaks matplot lib.

from pandas_profiling import ProfileReport

* run code to plot again*
* no plot appears * 

To Reproduce

We would need to reproduce your scenario before being able to resolve it.

Version information:

Version information is essential in reproducing and resolving bugs. Please report:

_Python version_: 3.7.11
 _Environment_:  Jupyter Notebook  local (python debian buster docker image

pip freeze:

keras==2.6.0
sklearn==0.0
jupyter==1.0.0
ipykernel==6.4.1
bash_kernel==0.7.2
kotlin-jupyter-kernel==0.10.0.249
scipy==1.7.1
simpy==4.0.1
matplotlib==3.4.3
numpy==1.21.2
pandas==1.3.3
requests==2.26.0
plotly==5.3.1
sympy==1.8
Pillow==8.3.2
h5py==3.4.0
google-api-python-client==2.23.0
tornado==6.0.3
papermill==2.3.3
notebook==6.0.2
pandas_profiling==3.1.0
sbrugman commented 2 years ago

Might come from setting the backend, could you check? https://github.com/pandas-profiling/pandas-profiling/blob/develop/src/pandas_profiling/config.py#L8

jodom961 commented 2 years ago

yeah can confirm that setting that backend to agg breaks plotting in jupyter notebooks... the default appears to be mpl.get_backend() : 'module://matplotlib_inline.backend_inline' maybe also related to #837 ?

Given that so many use jupyter for datascience/pandas stuff, maybe it could resolve the environment to see if ipython/jupyter is being used and if so set a different backend?

sbrugman commented 2 years ago

@jodom961 Yes, we should do this. Would you be interested in contributing a PR?

jodom961 commented 2 years ago

For sure.. will do shortly.