ydataai / ydata-profiling

1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.
https://docs.profiling.ydata.ai
MIT License
12.53k stars 1.68k forks source link

Issue #496 AbstractMethodError when using pandas_profile #685

Open JamesMDorame opened 3 years ago

JamesMDorame commented 3 years ago

Describe the bug

To Reproduce

**Version information:**

Additional context

JamesMDorame commented 3 years ago

Here is more information for this bug.

When trying to generate a report the following error occurs: AbstractMethodError: This method must be defined in the concrete class type

Environment Windows 10 Python 3.7 jupyter notebook numpy 1.16.2 pandas-profiling 2.10.0 PackageList.txt PackageManager.txt

Code to reproduce: import numpy as np import pandas_profiling df = pd.DataFrame( np.random.rand(100, 5), columns=["a", "b", "c", "d", "e"] ) pandas_profiling.ProfileReport(df)

Traceback:

AbstractMethodError Traceback (most recent call last)

in 5 columns=["a", "b", "c", "d", "e"] 6 ) ----> 7 pandas_profiling.ProfileReport(df) ~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas_profiling-2.10.0-py3.7.egg\pandas_profiling\profile_report.py in __init__(self, df, minimal, explorative, sensitive, dark_mode, orange_mode, sample, config_file, lazy, **kwargs) 94 if df is not None: 95 # preprocess df ---> 96 self.df = self.preprocess(df) 97 98 if not lazy: ~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas_profiling-2.10.0-py3.7.egg\pandas_profiling\profile_report.py in preprocess(df) 457 458 # Ensure that columns are strings --> 459 df.columns = df.columns.astype("str") 460 return df 461 ~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\indexes\base.py in astype(self, dtype, copy) 732 @Appender(_index_shared_docs['astype']) 733 def astype(self, dtype, copy=True): --> 734 if is_dtype_equal(self.dtype, dtype): 735 return self.copy() if copy else self 736 ~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\dtypes\common.py in is_dtype_equal(source, target) 773 try: 774 source = _get_dtype(source) --> 775 target = _get_dtype(target) 776 return source == target 777 except (TypeError, AttributeError): ~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\dtypes\common.py in _get_dtype(arr_or_dtype) 1840 arr_or_dtype = arr_or_dtype.dtype 1841 -> 1842 return pandas_dtype(arr_or_dtype) 1843 1844 ~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\dtypes\common.py in pandas_dtype(dtype) 2002 2003 # registered extension types -> 2004 result = registry.find(dtype) 2005 if result is not None: 2006 return result ~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\dtypes\dtypes.py in find(self, dtype) 87 for dtype_type in self.dtypes: 88 try: ---> 89 return dtype_type.construct_from_string(dtype) 90 except TypeError: 91 pass ~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\dtypes\base.py in construct_from_string(cls, string) 292 ... "'{}'".format(cls, string)) 293 """ --> 294 raise AbstractMethodError(cls) AbstractMethodError: This method must be defined in the concrete class type
JamesMDorame commented 3 years ago

Information added to comment on issue. Thank you

On Wed, Feb 3, 2021 at 8:03 AM Simon Brugman notifications@github.com wrote:

Could you provide the minimal information to reproduce this error? This guide http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports can help crafting a minimal bug report.

-

the minimal code you are using to generate the report

which environment you are using:

  • operating system (e.g. Windows, Linux, Mac)
    • Python version (e.g. 3.7)
    • jupyter notebook, console or IDE such as PyCharm
    • Package manager (e.g. pip, conda conda info)
    • packages (pip freeze > packages.txt or conda list)
  • a sample or description of the dataset (df.head(), df.info())

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/pandas-profiling/pandas-profiling/issues/685#issuecomment-772533784, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFTLXNLXY6RSFXRHJUFQ723S5FJUXANCNFSM4W5ILD6A .

sbrugman commented 3 years ago

Thanks @JamesMDorame. The issue in your environment seems to be with the code below, although I could not reproduce it. Could you please verify that the following snippet yields the same error:

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.rand(100, 5), columns=["a", "b", "c", "d", "e"])
df.columns = df.columns.astype('str')

print(df)
print(df.columns)

(most likely the pandas dependency is outdated)

JamesMDorame commented 3 years ago

That code does not return the error.

Abbreviated output 91 0.367833 0.188482 0.658861 0.485134 0.400419 92 0.624880 0.229420 0.591141 0.094314 0.523189 93 0.231544 0.340217 0.738064 0.472836 0.838604 94 0.586591 0.479526 0.074745 0.643456 0.519543 95 0.524571 0.471131 0.256094 0.456682 0.564187 96 0.860796 0.816732 0.590780 0.325540 0.619459 97 0.324934 0.872229 0.843747 0.316590 0.683571 98 0.795500 0.579887 0.371556 0.363243 0.177154 99 0.283282 0.459053 0.546533 0.257715 0.667444

[100 rows x 5 columns] Index(['a', 'b', 'c', 'd', 'e'], dtype='object')

sbrugman commented 3 years ago

Based on the stack trace and the code I can't find a clear reason why this is happening. Any help to track down the root cause is appreciated.