Closed pvojnisek closed 4 years ago
Analysis is wrong here, so I delete it
Than you for your reply, @loopyme, I have tried with these settings:
plot:
histogram:
bins: 6
bayesian_blocks_bins: yes
The result is not what I expected. Looks like this:
The histogram should be build from this table:
I think your idea about plot.histogram.bins
is great and straight forward. I would imagine if someone will need to define different number of bins for each variable. Is it possible to define it somehow?
After some tests, I found my error and I am very sorry that I pointed out the wrong bug position. I think cause of this problem is complex, the most direct reason(May not be the real reason) comes from here:
bayesian_blocks_bins
can do something with the histogram, but it doesn't seem to work the way @pvojnisek want either. I believe the problem is not in the adjustment of the default config, but in the fact that plot.histogram.bins
does not control the render behavior.
Different config for each variable is currently not supported, however.
@sbrugman Any comment or existing solution on the bug? I will continue to work on this issue if necessary.
Thanks for reporting this @pvojnisek. As @loopyme points out, the bin size used to be (unintelionally) hard-coded in render_real.py
. The next release will pre-compute histograms earlier in the process anyway, which will in addition to more efficient parallelization include a fix for this problem.
The v2.9.0rc1 release is out, and should resolve this issue. Until this version is fully released, you can install it via pip in the following way:
pip install --pre -U pandas-profiling
It would be very helpful to know if the release candidate adequately solves the issue.
Describe the bug I have a data table of 79 observations and 50 variables. I generate the profiling report regularly. I have used pandas-profiling 2.1 for quite a long time. There are some variables with the discrete values of (0, 1, 2, 3, 4, 5). The histogram looked like this in version 2.1: The same report in version 2.8 looks like this:
In version 2.8 there are 10 bins created which is not the best solution in this case. I have tried to set the number of bins manually in the yaml config file but it was unsuccessful. I am not sure if it is a bug or it is only my misunderstanding of the configuration and parameters. Please help me to solve this problem! Thanks a lot!
To Reproduce
We would need to reproduce your scenario before being able to resolve it.
Data:
The values are (0, 1, 2, 3, 4, 5) values.
Code: Preferably, use this code format:
from pandas_profiling import ProfileReport profile = ProfileReport(df, config_file='profiler_settings.yml') profile.to_file("profiler_report.html")
Version information:
pip
: If you are usingpip
, runpip freeze
in your environment and report the results. The list of packages can be rather long, you can use the snippet below to collapse the output.Click to expand Version information: alabaster==0.7.12 altgraph==0.16.1 anaconda-client==1.7.2 anaconda-navigator==1.9.7 anaconda-project==0.8.2 appdirs==1.4.3 asn1crypto==0.24.0 astroid==2.2.5 astropy==4.0.1.post1 atomicwrites==1.3.0 attrs==19.3.0 Babel==2.6.0 backcall==0.1.0 backports.os==0.1.1 backports.shutil-get-terminal-size==1.0.0 beautifulsoup4==4.7.1 bitarray==0.8.3 bkcharts==0.2 bleach==3.1.0 bokeh==1.0.4 boto==2.49.0 Bottleneck==1.2.1 cached-property==1.5.1 certifi==2019.3.9 cffi==1.12.2 chardet==3.0.4 Click==7.0 cloudpickle==0.8.0 clyent==1.2.2 colorama==0.4.1 conda==4.6.11 conda-build==3.17.8 conda-verify==3.1.1 confuse==1.0.0 contextlib2==0.5.5 cryptography==2.6.1 cycler==0.10.0 Cython==0.29.6 cytoolz==0.9.0.1 dask==2.5.2 decorator==4.4.0 defusedxml==0.5.0 distributed==2.5.2 docutils==0.14 entrypoints==0.3 et-xmlfile==1.0.1 fastcache==1.0.2 filelock==3.0.10 Flask==1.0.2 funcsigs==1.0.2 future==0.17.1 gevent==1.4.0 glob2==0.6 gmpy2==2.0.8 greenlet==0.4.15 h5py==2.9.0 heapdict==1.0.0 html5lib==1.0.1 htmlmin==0.1.12 idna==2.8 ImageHash==4.1.0 imageio==2.5.0 imagesize==1.1.0 importlib-metadata==0.0.0 ipykernel==5.1.0 ipython==7.4.0 ipython-genutils==0.2.0 ipywidgets==7.5.1 isodate==0.6.0 isort==4.3.16 itsdangerous==1.1.0 jdcal==1.4 jedi==0.13.3 jeepney==0.4 Jinja2==2.11.2 joblib==0.15.1 jsonschema==3.0.1 jupyter==1.0.0 jupyter-client==5.2.4 jupyter-console==6.0.0 jupyter-core==4.4.0 jupyterlab==0.35.4 jupyterlab-server==0.2.0 keyring==18.0.0 kiwisolver==1.0.1 lazy-object-proxy==1.3.1 libarchive-c==2.8 lief==0.9.0 llvmlite==0.28.0 locket==0.2.0 lxml==4.3.2 MarkupSafe==1.1.1 matplotlib==3.2.1 mccabe==0.6.1 missingno==0.4.2 mistune==0.8.4 mkl-fft==1.0.10 mkl-random==1.0.2 modin==0.6.1 more-itertools==6.0.0 mpmath==1.1.0 msgpack==0.6.1 multipledispatch==0.6.0 navigator-updater==0.2.1 nbconvert==5.4.1 nbformat==4.4.0 networkx==2.4 nltk==3.4 nose==1.3.7 notebook==5.7.8 numba==0.43.1 numexpr==2.6.9 numpy==1.16.2 numpydoc==0.8.0 olefile==0.46 openpyxl==2.6.1 packaging==19.0 pandas==1.0.3 pandas-profiling==2.8.0 pandocfilters==1.4.2 parso==0.3.4 partd==0.3.10 path.py==11.5.0 pathlib2==2.3.3 patsy==0.5.1 pep8==1.7.1 pexpect==4.6.0 phik==0.9.12 pickleshare==0.7.5 Pillow==5.4.1 pkginfo==1.5.0.1 pluggy==0.9.0 ply==3.11 prometheus-client==0.6.0 prompt-toolkit==2.0.9 protobuf==3.10.0 psutil==5.6.1 ptyprocess==0.6.0 py==1.8.0 pycodestyle==2.5.0 pycosat==0.6.3 pycparser==2.19 pycrypto==2.6.1 pycurl==7.43.0.2 pyflakes==2.1.1 Pygments==2.3.1 PyInstaller==3.5 pylint==2.3.1 pyodbc==4.0.26 pyOpenSSL==19.0.0 pyparsing==2.3.1 pyrsistent==0.14.11 pyserial==3.4 PySimpleGUI==4.1.0 PySocks==1.6.8 pytest==4.3.1 pytest-arraydiff==0.3 pytest-astropy==0.5.0 pytest-doctestplus==0.3.0 pytest-openfiles==0.3.2 pytest-pylint==0.14.0 pytest-remotedata==0.3.1 python-dateutil==2.8.0 pytz==2018.9 PyWavelets==1.0.2 PyYAML==5.1 pyzmq==18.0.0 QtAwesome==0.5.7 qtconsole==4.4.3 QtPy==1.7.0 ray==0.7.3 redis==3.3.8 requests==2.23.0 requests-toolbelt==0.9.1 retrying==1.3.3 rope==0.12.0 ruamel-yaml==0.15.46 scikit-image==0.14.2 scikit-learn==0.20.3 scipy==1.4.1 seaborn==0.9.0 SecretStorage==3.1.1 Send2Trash==1.5.0 simplegeneric==0.8.1 singledispatch==3.4.0.3 six==1.12.0 snowballstemmer==1.2.1 sortedcollections==1.1.2 sortedcontainers==2.1.0 soupsieve==1.8 Sphinx==1.8.5 sphinxcontrib-websupport==1.1.0 spyder==3.3.3 spyder-kernels==0.4.2 SQLAlchemy==1.3.1 statsmodels==0.9.0 sympy==1.3 tables==3.5.1 tangled-up-in-unicode==0.0.6 tblib==1.3.2 terminado==0.8.1 testpath==0.4.2 toolz==0.9.0 tornado==6.0.2 tqdm==4.46.0 traitlets==4.3.2 typed-ast==1.4.0 unicodecsv==0.14.1 urllib3==1.24.1 virtualenv==16.7.9 visions==0.4.4 wcwidth==0.1.7 webencodings==0.5.1 Werkzeug==0.14.1 widgetsnbextension==3.5.1 wrapt==1.11.1 wurlitzer==1.0.2 xlrd==1.2.0 XlsxWriter==1.1.5 xlwt==1.3.0 zeep==3.4.0 zict==0.1.4 zipp==0.3.3