sfu-db / dataprep

Open-source low code data preparation library in python. Collect, clean and visualization your data in python with a few lines of code.
http://dataprep.ai
MIT License
1.99k stars 203 forks source link

Not able to run the report example #809

Closed aborruso closed 2 years ago

aborruso commented 2 years ago

Hi, if I run

from dataprep.datasets import load_dataset
df = load_dataset("titanic")

from dataprep.eda import create_report
report = create_report(df)
report

I have the below error. How to solve it?

Thank you

AttributeError                            Traceback (most recent call last)
<ipython-input-3-098ba8c79d38> in <module>
      1 from dataprep.eda import create_report
----> 2 report = create_report(df)
      3 report

~/.local/lib/python3.9/site-packages/dataprep/eda/create_report/__init__.py in create_report(df, config, display, title, mode, progress)
     66         "resources": INLINE.render(),
     67         "title": title,
---> 68         "components": format_report(df, cfg, mode, progress),
     69     }
     70     template_base = ENV_LOADER.get_template("base.html")

~/.local/lib/python3.9/site-packages/dataprep/eda/create_report/formatter.py in format_report(df, cfg, mode, progress)
     74         if mode == "basic":
     75             edaframe = EDAFrame(df)
---> 76             comps = format_basic(edaframe, cfg)
     77         # elif mode == "full":
     78         #     comps = format_full(df)

~/.local/lib/python3.9/site-packages/dataprep/eda/create_report/formatter.py in format_basic(df, cfg)
    272 
    273     setattr(getattr(cfg, "plot"), "report", True)
--> 274     data, completions = basic_computations(df, cfg)
    275     with catch_warnings():
    276         filterwarnings(

~/.local/lib/python3.9/site-packages/dataprep/eda/create_report/formatter.py in basic_computations(df, cfg)
    381     # pylint: disable=too-many-branches, protected-access, too-many-locals
    382 
--> 383     variables_data = _compute_variables(df, cfg)
    384     overview_data = _compute_overview(df, cfg)
    385     data: Dict[str, Any] = {**variables_data, **overview_data}

~/.local/lib/python3.9/site-packages/dataprep/eda/create_report/formatter.py in _compute_variables(df, cfg)
    316                     data[col] = nom_comps(srs, cfg)
    317                 elif isinstance(dtype, Continuous):
--> 318                     data[col] = cont_comps(df.frame[col], cfg)
    319                 elif isinstance(dtype, DateTime):
    320                     data[col] = {}

~/.local/lib/python3.9/site-packages/dataprep/eda/distribution/compute/univariate.py in cont_comps(srs, cfg)
    198             data["norm"] = normaltest(data["hist"][0])
    199     if cfg.hist.enable and cfg.insight.enable:
--> 200         data["chisq"] = chisquare(data["hist"][0])
    201     # compute only the required amount of quantiles
    202     if cfg.qqnorm.enable:

~/.local/lib/python3.9/site-packages/dask/array/stats.py in chisquare(f_obs, f_exp, ddof, axis)
    134 @derived_from(scipy.stats)
    135 def chisquare(f_obs, f_exp=None, ddof=0, axis=0):
--> 136     return power_divergence(f_obs, f_exp=f_exp, ddof=ddof, axis=axis, lambda_="pearson")
    137 
    138 

~/.local/lib/python3.9/site-packages/dask/array/stats.py in power_divergence(f_obs, f_exp, ddof, axis, lambda_)
    142     if isinstance(lambda_, str):
    143         # TODO: public api
--> 144         if lambda_ not in scipy.stats.stats._power_div_lambda_names:
    145             names = repr(list(scipy.stats.stats._power_div_lambda_names.keys()))[1:-1]
    146             raise ValueError(

~/.local/lib/python3.9/site-packages/scipy/stats/stats.py in __getattr__(name)
     52 def __getattr__(name):
     53     if name not in __all__:
---> 54         raise AttributeError(
     55             "scipy.stats.stats is deprecated and has no attribute "
     56             f"{name}. Try looking in scipy.stats instead.")

AttributeError: scipy.stats.stats is deprecated and has no attribute _power_div_lambda_names. Try looking in scipy.stats instead.
jinglinpeng commented 2 years ago

Hi @aborruso , thanks for the issue. Seems it's because of the version of scipy. What's the current version? maybe you can try to upgrade it (e.g., 1.5.4)

aborruso commented 2 years ago

Hi @aborruso , thanks for the issue. Seems it's because of the version of scipy. What's the current version? maybe you can try to upgrade it (e.g., 1.5.4)

I have 1.8.0

jinglinpeng commented 2 years ago

Hi @aborruso , thanks for the issue. Seems it's because of the version of scipy. What's the current version? maybe you can try to upgrade it (e.g., 1.5.4)

I have 1.8.0

Thanks for the info. Currently we use an old dask version and seems it called a deprecated function of scipy, so an older version of scipy may solve the issue. Someone else also faces this issue and the version 1.5.4 solves the problem. Could you try it? Besides, we're also working on upgrading the dask version in dataprep.

aborruso commented 2 years ago

Thanks for the info. Currently we use an old dask version and seems it called a deprecated function of scipy, so an older version of scipy may solve the issue. Someone else also faces this issue and the version 1.5.4 solves the problem. Could you try it? Besides, we're also working on upgrading the dask version in dataprep.

I have forced 1.5.4 and it works.

You should force this in installation or to put an alert and stop if it's not ok

Thank you very much