ydataai / ydata-profiling

1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.
https://docs.profiling.ydata.ai
MIT License
12.53k stars 1.69k forks source link

missing data matrix not rendering correctly #595

Closed TrentonBush closed 3 years ago

TrentonBush commented 4 years ago

First, thanks for saving so many hours of boilerplate EDA with this project. I appreciate it!

I ran into an issue with the missing data matrix rendering missing bits as 'blurred' bars that run into neighboring columns and rows: image

For comparison, I made the same matrix plot directly with missingno in jupyter notebook: image

The plots above were made with toy 10x10 data (code below), but when larger datasets are used (50k x 30, in my case), Firefox renders differently than Chrome/Edge (all bad, just differently bad). So I'm wondering if it's an encoding issue? This issue is consistent across to_widget vs to_notebook_iframe vs to_file("report.html") output,

Have you seen this before? Here is the output html report

To Reproduce

Code:

"""
Test for issue XXX:
https://github.com/pandas-profiling/pandas-profiling/issues/XXX
"""
import pandas as pd
import numpy as np
import pandas_profiling

df = pd.DataFrame({i : [1.0]*10 for i in range(10)})
df.loc[:,5] = np.nan
df.loc[3:5, 7] = np.nan
df.loc[1,1] = np.nan

pandas_profiling.ProfileReport(df, correlations=None, interactions=None, duplicates=None).to_file("bad_render.html")

Version information:

Version information is essential in reproducing and resolving bugs. Please report:

Click to expand Version information

``` dependencies: - _libgcc_mutex=0.1=conda_forge - _openmp_mutex=4.5=0_gnu - abseil-cpp=20200225.1=he1b5a44_2 - aiofiles=0.5.0=py_0 - aiohttp=3.6.2=py38h516909a_0 - altair=4.0.1=py_0 - appdirs=1.4.3=py_1 - arrow=0.15.6=py38h32f6830_1 - arrow-cpp=0.16.0=py38hd8d096e_1 - async-timeout=3.0.1=py_1000 - attrs=19.3.0=py_0 - aws-sdk-cpp=1.7.164=h1f8afcc_0 - backcall=0.1.0=py_0 - beautifulsoup4=4.9.1=py38h32f6830_0 - binaryornot=0.4.4=py_1 - black=19.10b0=py38_0 - bleach=3.1.3=pyh8c360ce_0 - bokeh=2.0.1=py38h32f6830_0 - boost-cpp=1.72.0=h8e57a91_0 - brotli=1.0.7=he1b5a44_1001 - brotlipy=0.7.0=py38h1e0a361_1000 - bzip2=1.0.8=h516909a_2 - c-ares=1.15.0=h516909a_1001 - ca-certificates=2020.6.20=hecda079_0 - certifi=2020.6.20=py38h32f6830_0 - cffi=1.14.0=py38hd463f26_0 - cftime=1.1.1.2=py38h8790de6_0 - chardet=3.0.4=py38h32f6830_1006 - click=7.1.1=pyh8c360ce_0 - cloudpickle=1.4.1=py_0 - colorama=0.4.3=py_0 - confuse=1.3.0=pyh9f0ad1d_0 - cookiecutter=1.7.2=pyh9f0ad1d_0 - cryptography=2.8=py38h766eaa4_2 - curl=7.68.0=hf8cf82a_0 - cycler=0.10.0=py_2 - cytoolz=0.10.1=py38h516909a_0 - dask=2.15.0=py_0 - dask-core=2.15.0=py_0 - dask-labextension=2.0.2=py_0 - dbus=1.13.6=he372182_0 - decorator=4.4.2=py_0 - defusedxml=0.6.0=py_0 - distributed=2.15.2=py38h32f6830_0 - entrypoints=0.3=py38h32f6830_1001 - expat=2.2.9=he1b5a44_2 - flake8=3.7.9=py38h32f6830_1 - fontconfig=2.13.1=h86ecdb6_1001 - freetype=2.10.1=he06d7ca_0 - fsspec=0.7.3=py_0 - gettext=0.19.8.1=hc5be6a0_1002 - gflags=2.2.2=he1b5a44_1002 - glib=2.58.3=py38h73cb85d_1003 - glog=0.4.0=he1b5a44_1 - grpc-cpp=1.27.3=h7397029_1 - gst-plugins-base=1.14.5=h0935bb2_2 - gstreamer=1.14.5=h36ae1b5_2 - h11=0.9.0=py_0 - h2=3.2.0=py38h32f6830_1 - hdf4=4.2.13=hf30be14_1003 - hdf5=1.10.5=nompi_h3c11f04_1104 - heapdict=1.0.1=py_0 - hpack=3.0.0=py_0 - hstspreload=2020.5.13=py_0 - htmlmin=0.1.12=py_1 - httpcore=0.10.2=py_0 - httpx=0.14.2=py_0 - hyperframe=5.2.0=py_0 - icu=64.2=he1b5a44_1 - idna=2.9=py_1 - imagehash=4.1.0=pyh9f0ad1d_0 - importlib-metadata=1.5.0=py38h32f6830_1 - importlib_metadata=1.5.0=1 - ipykernel=5.1.4=py38h5ca1d4c_0 - ipython=7.13.0=py38h23f93f0_1 - ipython_genutils=0.2.0=py_1 - ipywidgets=7.5.1=py_0 - jedi=0.16.0=py38h32f6830_1 - jinja2=2.11.1=py_0 - jinja2-time=0.2.0=py_2 - joblib=0.14.1=py_0 - jpeg=9c=h14c3975_1001 - json5=0.9.0=py_0 - jsonschema=3.2.0=py38h32f6830_1 - jupyter-server-proxy=1.5.0=py_0 - jupyter_client=6.1.0=py_0 - jupyter_contrib_core=0.3.3=py_2 - jupyter_contrib_nbextensions=0.5.1=py38_0 - jupyter_core=4.6.3=py38h32f6830_1 - jupyter_highlight_selected_word=0.2.0=py38_1000 - jupyter_latex_envs=1.4.6=py38_1000 - jupyter_nbextensions_configurator=0.4.1=py38_0 - jupyterlab=2.0.1=py_0 - jupyterlab_server=1.0.7=py_0 - kiwisolver=1.1.0=py38hbf85e49_1 - krb5=1.16.4=h2fd8d38_0 - ld_impl_linux-64=2.34=h53a641e_0 - libblas=3.8.0=14_openblas - libcblas=3.8.0=14_openblas - libclang=9.0.1=default_hde54327_0 - libcurl=7.68.0=hda55be3_0 - libedit=3.1.20170329=hf8c457e_1001 - libevent=2.1.10=h72c5cf5_0 - libffi=3.2.1=he1b5a44_1007 - libgcc-ng=9.2.0=h24d8f2e_2 - libgfortran-ng=7.3.0=hdf63c60_5 - libgomp=9.2.0=h24d8f2e_2 - libiconv=1.15=h516909a_1006 - liblapack=3.8.0=14_openblas - libllvm8=8.0.1=hc9558a2_0 - libllvm9=9.0.1=hc9558a2_0 - libnetcdf=4.7.4=nompi_h9f9fd6a_101 - libopenblas=0.3.7=h5ec1e0e_6 - libpng=1.6.37=hed695b0_1 - libprotobuf=3.11.4=h8b12597_0 - libsodium=1.0.17=h516909a_0 - libssh2=1.8.2=h22169c7_2 - libstdcxx-ng=9.2.0=hdf63c60_2 - libtiff=4.1.0=hc7e4089_6 - libuuid=2.32.1=h14c3975_1000 - libuv=1.34.0=h516909a_0 - libwebp-base=1.1.0=h516909a_3 - libxcb=1.13=h14c3975_1002 - libxkbcommon=0.10.0=he1b5a44_0 - libxml2=2.9.10=hee79883_0 - libxslt=1.1.33=h31b3aaa_0 - line_profiler=3.0.2=py38hc9558a2_0 - llvmlite=0.31.0=py38h4f45e52_1 - locket=0.2.0=py_2 - lxml=4.5.0=py38hbb43d70_1 - lz4-c=1.8.3=he1b5a44_1001 - markupsafe=1.1.1=py38h1e0a361_1 - matplotlib=3.2.1=0 - matplotlib-base=3.2.1=py38h2af1d28_0 - mccabe=0.6.1=py_1 - memory_profiler=0.57.0=py_0 - missingno=0.4.2=py_1 - mistune=0.8.4=py38h516909a_1000 - more-itertools=8.2.0=py_0 - msgpack-python=1.0.0=py38hbf85e49_1 - multidict=4.7.5=py38h1e0a361_1 - mypy=0.770=py_0 - mypy_extensions=0.4.3=py38h32f6830_1 - nbconvert=5.6.1=py38_0 - nbformat=5.0.4=py_0 - ncurses=6.1=hf484d3e_1002 - netcdf4=1.5.3=nompi_py38heb6102f_103 - networkx=2.5=py_0 - nodejs=13.10.1=hf5d1a2b_0 - notebook=6.0.3=py38_0 - nspr=4.25=he1b5a44_0 - nss=3.47=he751ad9_0 - numba=0.48.0=py38hb3f55d8_0 - numpy=1.18.1=py38h95a1406_0 - olefile=0.46=py_0 - openssl=1.1.1h=h516909a_0 - packaging=20.1=py_0 - pandas=1.0.3=py38hcb8c335_0 - pandas-profiling=2.9.0=pyh9f0ad1d_0 - pandoc=2.9.2=0 - pandocfilters=1.4.2=py_1 - parquet-cpp=1.5.1=2 - parso=0.6.2=py_0 - partd=1.1.0=py_0 - pathspec=0.7.0=py_0 - patsy=0.5.1=py_0 - pcre=8.44=he1b5a44_0 - pexpect=4.8.0=py38h32f6830_1 - phik=0.10.0=py_0 - pickleshare=0.7.5=py38h32f6830_1001 - pillow=7.1.2=py38h9776b28_0 - pip=20.0.2=py_2 - pluggy=0.13.1=py38_0 - poyo=0.5.0=py_0 - prometheus_client=0.7.1=py_0 - prompt-toolkit=3.0.4=py_0 - psutil=5.7.0=py38h1e0a361_1 - pthread-stubs=0.4=h14c3975_1001 - ptyprocess=0.6.0=py_1001 - py=1.8.1=py_0 - pyarrow=0.16.0=py38hd02d5f2_2 - pycodestyle=2.5.0=py_0 - pycparser=2.20=py_0 - pyflakes=2.1.1=py_0 - pygments=2.6.1=py_0 - pyopenssl=19.1.0=py_1 - pyparsing=2.4.6=py_0 - pyqt=5.12.3=py38hcca6a23_1 - pyrsistent=0.15.7=py38h1e0a361_1 - pysocks=1.7.1=py38h32f6830_1 - pytest=5.4.1=py38h32f6830_0 - python=3.8.2=h9d8adfe_4_cpython - python-dateutil=2.8.1=py_0 - python-dotenv=0.13.0=pyh9f0ad1d_0 - python-slugify=4.0.0=pyh9f0ad1d_1 - python_abi=3.8=1_cp38 - pytz=2019.3=py_0 - pywavelets=1.1.1=py38hab2c0dc_2 - pyyaml=5.3.1=py38h1e0a361_0 - pyzmq=19.0.0=py38ha71036d_1 - qt=5.12.5=hd8c4c69_1 - re2=2020.03.03=he1b5a44_0 - readline=8.0=hf8c457e_0 - regex=2020.2.20=py38h1e0a361_1 - requests=2.23.0=pyh8c360ce_2 - rfc3986=1.3.2=py_0 - rope=0.16.0=py_0 - scikit-learn=0.22.2.post1=py38hcdab131_0 - scipy=1.4.1=py38h18bccfc_2 - seaborn=0.11.0=0 - seaborn-base=0.11.0=py_0 - send2trash=1.5.0=py_0 - setuptools=46.0.0=py38h32f6830_2 - shellingham=1.3.2=py_0 - simpervisor=0.3=py_1 - six=1.14.0=py_1 - snappy=1.1.8=he1b5a44_1 - sniffio=1.1.0=py38h32f6830_2 - sortedcontainers=2.1.0=py_0 - soupsieve=2.0.1=py38h32f6830_0 - sqlite=3.30.1=hcee41ef_0 - statsmodels=0.12.0=py38h1e0a361_0 - tangled-up-in-unicode=0.0.6=pyh9f0ad1d_0 - tblib=1.6.0=py_0 - terminado=0.8.3=py38h32f6830_1 - testpath=0.4.4=py_0 - text-unidecode=1.3=py_0 - thrift-cpp=0.13.0=h62aa4f2_2 - tk=8.6.10=hed695b0_0 - toml=0.10.0=py_0 - toolz=0.10.0=py_0 - tornado=6.0.4=py38h1e0a361_1 - tqdm=4.48.2=pyh9f0ad1d_0 - traitlets=4.3.3=py38h32f6830_1 - typed-ast=1.4.1=py38h516909a_0 - typer=0.3.1=py_0 - typing_extensions=3.7.4.1=py38h32f6830_1 - unidecode=1.1.1=py_0 - urllib3=1.25.7=py38h32f6830_1 - visions=0.5.0=pyh9f0ad1d_0 - wcwidth=0.1.8=py_0 - webencodings=0.5.1=py_1 - wheel=0.34.2=py_1 - whichcraft=0.6.1=py_0 - widgetsnbextension=3.5.1=py38_0 - xarray=0.15.1=py_0 - xlrd=1.2.0=py_0 - xorg-libxau=1.0.9=h14c3975_0 - xorg-libxdmcp=1.1.3=h516909a_0 - xz=5.2.4=h516909a_1002 - yaml=0.2.2=h516909a_1 - yarl=1.3.0=py38h516909a_1000 - zeromq=4.3.2=he1b5a44_2 - zict=2.0.0=py_0 - zipp=3.1.0=py_0 - zlib=1.2.11=h516909a_1006 - zstd=1.4.4=h3b9ef0a_2 - pip: - aiometer==0.2.1 - anyio==1.3.0 - async-generator==1.10 - pyqt5-sip==4.19.18 - pyqtwebengine==5.12.1 ```

-->

AmandaSouzaMachado commented 3 years ago

I tried to reproduce the issue and seems to be solved.

Chrome:

chrome

Firefox:

firefox

The issue could be closed.