ydataai / ydata-profiling

1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.
https://docs.profiling.ydata.ai
MIT License
12.24k stars 1.66k forks source link

Google Colab #534

Closed sehHeiden closed 3 years ago

sehHeiden commented 3 years ago

Running on Google Colab

I was trying to run the profiling on Google Colab with profile = ProfileReport(train_data). The problem was: this Error:

concat() got an unexpected keyword argument 'join_axes'

Therefore I tried all all the Google Colab examples in for the Profiling. Which either did not exist anymore or didn't work any. As the synthax has changed and:from pandas_profiling.utils.cache import cache_file doesn't seam to exist anymore.

Sometimes on my machines the minimal parameter helped, but as with any other parameter added I get the Error:

_plot_histogram() got an unexpected keyword argument 'minimal'

Data: House Pricing csv from Kaggle


> <class 'pandas.core.frame.DataFrame'>
> RangeIndex: 1460 entries, 0 to 1459
> Data columns (total 81 columns):
>  #   Column         Non-Null Count  Dtype  
> ---  ------         --------------  -----  
>  0   Id             1460 non-null   int64  
>  1   MSSubClass     1460 non-null   int64  
>  2   MSZoning       1460 non-null   object 
>  3   LotFrontage    1201 non-null   float64
>  4   LotArea        1460 non-null   int64  
>  5   Street         1460 non-null   object 
>  6   Alley          91 non-null     object 
>  7   LotShape       1460 non-null   object 
>  8   LandContour    1460 non-null   object 
>  9   Utilities      1460 non-null   object 
>  10  LotConfig      1460 non-null   object 
>  11  LandSlope      1460 non-null   object 
>  12  Neighborhood   1460 non-null   object 
>  13  Condition1     1460 non-null   object 
>  14  Condition2     1460 non-null   object 
>  15  BldgType       1460 non-null   object 
>  16  HouseStyle     1460 non-null   object 
>  17  OverallQual    1460 non-null   int64  
>  18  OverallCond    1460 non-null   int64  
>  19  YearBuilt      1460 non-null   int64  
>  20  YearRemodAdd   1460 non-null   int64  
>  21  RoofStyle      1460 non-null   object 
>  22  RoofMatl       1460 non-null   object 
>  23  Exterior1st    1460 non-null   object 
>  24  Exterior2nd    1460 non-null   object 
>  25  MasVnrType     1452 non-null   object 
>  26  MasVnrArea     1452 non-null   float64
>  27  ExterQual      1460 non-null   object 
>  28  ExterCond      1460 non-null   object 
>  29  Foundation     1460 non-null   object 
>  30  BsmtQual       1423 non-null   object 
>  31  BsmtCond       1423 non-null   object 
>  32  BsmtExposure   1422 non-null   object 
>  33  BsmtFinType1   1423 non-null   object 
>  34  BsmtFinSF1     1460 non-null   int64  
>  35  BsmtFinType2   1422 non-null   object 
>  36  BsmtFinSF2     1460 non-null   int64  
>  37  BsmtUnfSF      1460 non-null   int64  
>  38  TotalBsmtSF    1460 non-null   int64  
>  39  Heating        1460 non-null   object 
>  40  HeatingQC      1460 non-null   object 
>  41  CentralAir     1460 non-null   object 
>  42  Electrical     1459 non-null   object 
>  43  1stFlrSF       1460 non-null   int64  
>  44  2ndFlrSF       1460 non-null   int64  
>  45  LowQualFinSF   1460 non-null   int64  
>  46  GrLivArea      1460 non-null   int64  
>  47  BsmtFullBath   1460 non-null   int64  
>  48  BsmtHalfBath   1460 non-null   int64  
>  49  FullBath       1460 non-null   int64  
>  50  HalfBath       1460 non-null   int64  
>  51  BedroomAbvGr   1460 non-null   int64  
>  52  KitchenAbvGr   1460 non-null   int64  
>  53  KitchenQual    1460 non-null   object 
>  54  TotRmsAbvGrd   1460 non-null   int64  
>  55  Functional     1460 non-null   object 
>  56  Fireplaces     1460 non-null   int64  
>  57  FireplaceQu    770 non-null    object 
>  58  GarageType     1379 non-null   object 
>  59  GarageYrBlt    1379 non-null   float64
>  60  GarageFinish   1379 non-null   object 
>  61  GarageCars     1460 non-null   int64  
>  62  GarageArea     1460 non-null   int64  
>  63  GarageQual     1379 non-null   object 
>  64  GarageCond     1379 non-null   object 
>  65  PavedDrive     1460 non-null   object 
>  66  WoodDeckSF     1460 non-null   int64  
>  67  OpenPorchSF    1460 non-null   int64  
>  68  EnclosedPorch  1460 non-null   int64  
>  69  3SsnPorch      1460 non-null   int64  
>  70  ScreenPorch    1460 non-null   int64  
>  71  PoolArea       1460 non-null   int64  
>  72  PoolQC         7 non-null      object 
>  73  Fence          281 non-null    object 
>  74  MiscFeature    54 non-null     object 
>  75  MiscVal        1460 non-null   int64  
>  76  MoSold         1460 non-null   int64  
>  77  YrSold         1460 non-null   int64  
>  78  SaleType       1460 non-null   object 
>  79  SaleCondition  1460 non-null   object 
>  80  SalePrice      1460 non-null   int64  
> dtypes: float64(3), int64(35), object(43)
> memory usage: 924.0+ KB

Additional context By the way running with DataLore also didn't work.

sbrugman commented 3 years ago

Have you tried updating?

(https://pandas-profiling.github.io/pandas-profiling/docs/master/rtd/pages/installation.html)

sehHeiden commented 3 years ago

I tried to use:

import sys
!{sys.executable} -m pip install -U pandas-profiling[notebook]
!jupyter nbextension enable --py widgetsnbextension

on Colab:


FileNotFoundError Traceback (most recent call last)

/usr/local/lib/python3.6/dist-packages/matplotlib/style/core.py in use(style) 113 try: --> 114 rc = rc_params_from_file(style, use_default_template=False) 115 _apply_style(rc)

7 frames

FileNotFoundError: [Errno 2] No such file or directory: '/usr/local/lib/python3.6/dist-packages/pandas_profiling/pandas_profiling.mplstyle'

During handling of the above exception, another exception occurred:

OSError Traceback (most recent call last)

/usr/local/lib/python3.6/dist-packages/matplotlib/style/core.py in use(style) 118 "{!r} not found in the style library and input is not a " 119 "valid URL or path; see style.available for list of " --> 120 "available styles".format(style)) 121 122

OSError: '/usr/local/lib/python3.6/dist-packages/pandas_profiling/pandas_profiling.mplstyle' not found in the style library and input is not a valid URL or path; see style.available for list of available styles

and on DataLore:

A Jupyter widget could not be displayed because the widget state could not be found. This could happen if the kernel storing the widget is no longer available, or if the widget state was not saved in the notebook. You may be able to create the widget by running the appropriate cells.

When ProfileReport is being called.

sbrugman commented 3 years ago

Do these errors persist after restarting the kernel?

sehHeiden commented 3 years ago

Seams like restarting the kernel after using:

import sys !{sys.executable} -m pip install -U pandas-profiling[notebook] !jupyter nbextension enable --py widgetsnbextension

did work reproduceably with Colab. Also did work once with DataLore, but I could not reproduce that. It would be great when the Colab examples could be updated.

sbrugman commented 3 years ago

What do you mean with reproducably? That you could reproduce the error or the results?

If this fixed your issue, we should definitely add it to the notebooks!

rbarman commented 3 years ago

I also had the same issue in Google Colab. Meresmata's code snippet fixed the issue.

elephantoid commented 2 years ago

Does who solve this problem? I tried all about this issue. but I can't use profile function. Thank you