ydataai / ydata-profiling

1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.
https://docs.profiling.ydata.ai
MIT License
12.53k stars 1.68k forks source link

Bug Report: SystemError: unknown opcode #1295

Open cj2001 opened 1 year ago

cj2001 commented 1 year ago

Current Behaviour

I am running in Jupyter Lab and just upgraded to Python 3.10.6. My initial dataset is in a Pandas dataframe that is less than 8000 rows and less than 50 columns. When I used this dataframe, I received the error SystemError: unknown opcode. It also blows through my computer memory and bogs down my browser. (I have nothing else running.)

I then tried it with a subset of the data. Here is my code:

import numpy as np
import pandas as pd
from ydata_profiling import ProfileReport

df = pd.read_excel('../data/mendeley.xlsx')
test_df = df.head(10)
test_profile = ProfileReport(df)
test_profile

(Note that I also get the following error if I use .to_widgets().)

These 10 rows also crash. I receive the following error:

XXX lineno: 77306, opcode: 36
XXX lineno: 77306, opcode: 36
XXX lineno: 77306, opcode: 36
XXX lineno: 77306, opcode: 36
XXX lineno: 77306, opcode: 36
XXX lineno: 77306, opcode: 36
XXX lineno: 77306, opcode: 36
XXX lineno: 77306, opcode: 36
XXX lineno: 77306, opcode: 36
XXX lineno: 77306, opcode: 36
XXX lineno: 77306, opcode: 36
XXX lineno: 77306, opcode: 36
XXX lineno: 77306, opcode: 36
XXX lineno: 77306, opcode: 36
XXX lineno: 77306, opcode: 36
XXX lineno: 77306, opcode: 36
XXX lineno: 77306, opcode: 36
---------------------------------------------------------------------------
SystemError                               Traceback (most recent call last)
File ~/notebook_env/lib/python3.10/site-packages/IPython/core/formatters.py:342, in BaseFormatter.__call__(self, obj)
    340     method = get_real_method(obj, self.print_method)
    341     if method is not None:
--> 342         return method()
    343     return None
    344 else:

File ~/notebook_env/lib/python3.10/site-packages/typeguard/__init__.py:1033, in typechecked.<locals>.wrapper(*args, **kwargs)
   1031 memo = _CallMemo(python_func, _localns, args=args, kwargs=kwargs)
   1032 check_argument_types(memo)
-> 1033 retval = func(*args, **kwargs)
   1034 try:
   1035     check_return_type(retval, memo)

File ~/notebook_env/lib/python3.10/site-packages/ydata_profiling/profile_report.py:511, in ProfileReport._repr_html_(self)
    509 def _repr_html_(self) -> None:
    510     """The ipython notebook widgets user interface gets called by the jupyter notebook."""
--> 511     self.to_notebook_iframe()

File ~/notebook_env/lib/python3.10/site-packages/typeguard/__init__.py:1033, in typechecked.<locals>.wrapper(*args, **kwargs)
   1031 memo = _CallMemo(python_func, _localns, args=args, kwargs=kwargs)
   1032 check_argument_types(memo)
-> 1033 retval = func(*args, **kwargs)
   1034 try:
   1035     check_return_type(retval, memo)

File ~/notebook_env/lib/python3.10/site-packages/ydata_profiling/profile_report.py:491, in ProfileReport.to_notebook_iframe(self)
    489 with warnings.catch_warnings():
    490     warnings.simplefilter("ignore")
--> 491     display(get_notebook_iframe(self.config, self))

File ~/notebook_env/lib/python3.10/site-packages/ydata_profiling/report/presentation/flavours/widget/notebook.py:75, in get_notebook_iframe(config, profile)
     73     output = get_notebook_iframe_src(config, profile)
     74 elif attribute == IframeAttribute.srcdoc:
---> 75     output = get_notebook_iframe_srcdoc(config, profile)
     76 else:
     77     raise ValueError(
     78         f'Iframe Attribute can be "src" or "srcdoc" (current: {attribute}).'
     79     )

File ~/notebook_env/lib/python3.10/site-packages/ydata_profiling/report/presentation/flavours/widget/notebook.py:29, in get_notebook_iframe_srcdoc(config, profile)
     27 width = config.notebook.iframe.width
     28 height = config.notebook.iframe.height
---> 29 src = html.escape(profile.to_html())
     31 iframe = f'<iframe width="{width}" height="{height}" srcdoc="{src}" frameborder="0" allowfullscreen></iframe>'
     33 return HTML(iframe)

File ~/notebook_env/lib/python3.10/site-packages/typeguard/__init__.py:1033, in typechecked.<locals>.wrapper(*args, **kwargs)
   1031 memo = _CallMemo(python_func, _localns, args=args, kwargs=kwargs)
   1032 check_argument_types(memo)
-> 1033 retval = func(*args, **kwargs)
   1034 try:
   1035     check_return_type(retval, memo)

File ~/notebook_env/lib/python3.10/site-packages/ydata_profiling/profile_report.py:461, in ProfileReport.to_html(self)
    453 def to_html(self) -> str:
    454     """Generate and return complete template as lengthy string
    455         for using with frameworks.
    456 
   (...)
    459 
    460     """
--> 461     return self.html

File ~/notebook_env/lib/python3.10/site-packages/typeguard/__init__.py:1033, in typechecked.<locals>.wrapper(*args, **kwargs)
   1031 memo = _CallMemo(python_func, _localns, args=args, kwargs=kwargs)
   1032 check_argument_types(memo)
-> 1033 retval = func(*args, **kwargs)
   1034 try:
   1035     check_return_type(retval, memo)

File ~/notebook_env/lib/python3.10/site-packages/ydata_profiling/profile_report.py:272, in ProfileReport.html(self)
    269 @property
    270 def html(self) -> str:
    271     if self._html is None:
--> 272         self._html = self._render_html()
    273     return self._html

File ~/notebook_env/lib/python3.10/site-packages/typeguard/__init__.py:1033, in typechecked.<locals>.wrapper(*args, **kwargs)
   1031 memo = _CallMemo(python_func, _localns, args=args, kwargs=kwargs)
   1032 check_argument_types(memo)
-> 1033 retval = func(*args, **kwargs)
   1034 try:
   1035     check_return_type(retval, memo)

File ~/notebook_env/lib/python3.10/site-packages/ydata_profiling/profile_report.py:380, in ProfileReport._render_html(self)
    377 def _render_html(self) -> str:
    378     from ydata_profiling.report.presentation.flavours import HTMLReport
--> 380     report = self.report
    382     with tqdm(
    383         total=1, desc="Render HTML", disable=not self.config.progress_bar
    384     ) as pbar:
    385         html = HTMLReport(copy.deepcopy(report)).render(
    386             nav=self.config.html.navbar_show,
    387             offline=self.config.html.use_local_assets,
   (...)
    395             version=self.description_set["package"]["ydata_profiling_version"],
    396         )

File ~/notebook_env/lib/python3.10/site-packages/typeguard/__init__.py:1033, in typechecked.<locals>.wrapper(*args, **kwargs)
   1031 memo = _CallMemo(python_func, _localns, args=args, kwargs=kwargs)
   1032 check_argument_types(memo)
-> 1033 retval = func(*args, **kwargs)
   1034 try:
   1035     check_return_type(retval, memo)

File ~/notebook_env/lib/python3.10/site-packages/ydata_profiling/profile_report.py:266, in ProfileReport.report(self)
    263 @property
    264 def report(self) -> Root:
    265     if self._report is None:
--> 266         self._report = get_report_structure(self.config, self.description_set)
    267     return self._report

File ~/notebook_env/lib/python3.10/site-packages/typeguard/__init__.py:1033, in typechecked.<locals>.wrapper(*args, **kwargs)
   1031 memo = _CallMemo(python_func, _localns, args=args, kwargs=kwargs)
   1032 check_argument_types(memo)
-> 1033 retval = func(*args, **kwargs)
   1034 try:
   1035     check_return_type(retval, memo)

File ~/notebook_env/lib/python3.10/site-packages/ydata_profiling/profile_report.py:248, in ProfileReport.description_set(self)
    245 @property
    246 def description_set(self) -> Dict[str, Any]:
    247     if self._description_set is None:
--> 248         self._description_set = describe_df(
    249             self.config,
    250             self.df,
    251             self.summarizer,
    252             self.typeset,
    253             self._sample,
    254         )
    255     return self._description_set

File ~/notebook_env/lib/python3.10/site-packages/ydata_profiling/model/describe.py:71, in describe(config, df, summarizer, typeset, sample)
     69 # Variable-specific
     70 pbar.total += len(df.columns)
---> 71 series_description = get_series_descriptions(
     72     config, df, summarizer, typeset, pbar
     73 )
     75 pbar.set_postfix_str("Get variable types")
     76 pbar.total += 1

File ~/notebook_env/lib/python3.10/site-packages/multimethod/__init__.py:315, in multimethod.__call__(self, *args, **kwargs)
    313 func = self[tuple(func(arg) for func, arg in zip(self.type_checkers, args))]
    314 try:
--> 315     return func(*args, **kwargs)
    316 except TypeError as ex:
    317     raise DispatchError(f"Function {func.__code__}") from ex

File ~/notebook_env/lib/python3.10/site-packages/ydata_profiling/model/pandas/summary_pandas.py:99, in pandas_get_series_descriptions(config, df, summarizer, typeset, pbar)
     96 else:
     97     # TODO: use `Pool` for Linux-based systems
     98     with multiprocessing.pool.ThreadPool(pool_size) as executor:
---> 99         for i, (column, description) in enumerate(
    100             executor.imap_unordered(multiprocess_1d, args)
    101         ):
    102             pbar.set_postfix_str(f"Describe variable:{column}")
    103             series_description[column] = description

File /usr/lib/python3.10/multiprocessing/pool.py:873, in IMapIterator.next(self, timeout)
    871 if success:
    872     return value
--> 873 raise value

File /usr/lib/python3.10/multiprocessing/pool.py:125, in worker(inqueue, outqueue, initializer, initargs, maxtasks, wrap_exception)
    123 job, i, func, args, kwds = task
    124 try:
--> 125     result = (True, func(*args, **kwds))
    126 except Exception as e:
    127     if wrap_exception and func is not _helper_reraises_exception:

File ~/notebook_env/lib/python3.10/site-packages/ydata_profiling/model/pandas/summary_pandas.py:79, in pandas_get_series_descriptions.<locals>.multiprocess_1d(args)
     69 """Wrapper to process series in parallel.
     70 
     71 Args:
   (...)
     76     A tuple with column and the series description.
     77 """
     78 column, series = args
---> 79 return column, describe_1d(config, series, summarizer, typeset)

File ~/notebook_env/lib/python3.10/site-packages/multimethod/__init__.py:315, in multimethod.__call__(self, *args, **kwargs)
    313 func = self[tuple(func(arg) for func, arg in zip(self.type_checkers, args))]
    314 try:
--> 315     return func(*args, **kwargs)
    316 except TypeError as ex:
    317     raise DispatchError(f"Function {func.__code__}") from ex

File ~/notebook_env/lib/python3.10/site-packages/ydata_profiling/model/pandas/summary_pandas.py:57, in pandas_describe_1d(config, series, summarizer, typeset)
     54     vtype = typeset.detect_type(series)
     56 typeset.type_schema[series.name] = vtype
---> 57 return summarizer.summarize(config, series, dtype=vtype)

File ~/notebook_env/lib/python3.10/site-packages/ydata_profiling/model/summarizer.py:39, in BaseSummarizer.summarize(self, config, series, dtype)
     31 def summarize(
     32     self, config: Settings, series: pd.Series, dtype: Type[VisionsBaseType]
     33 ) -> dict:
     34     """
     35 
     36     Returns:
     37         object:
     38     """
---> 39     _, _, summary = self.handle(str(dtype), config, series, {"type": str(dtype)})
     40     return summary

File ~/notebook_env/lib/python3.10/site-packages/ydata_profiling/model/handler.py:62, in Handler.handle(self, dtype, *args, **kwargs)
     60 funcs = self.mapping.get(dtype, [])
     61 op = compose(funcs)
---> 62 return op(*args)

File ~/notebook_env/lib/python3.10/site-packages/ydata_profiling/model/handler.py:21, in compose.<locals>.func.<locals>.func2(*x)
     19     return f(*x)
     20 else:
---> 21     return f(*res)

File ~/notebook_env/lib/python3.10/site-packages/ydata_profiling/model/handler.py:21, in compose.<locals>.func.<locals>.func2(*x)
     19     return f(*x)
     20 else:
---> 21     return f(*res)

File ~/notebook_env/lib/python3.10/site-packages/ydata_profiling/model/handler.py:21, in compose.<locals>.func.<locals>.func2(*x)
     19     return f(*x)
     20 else:
---> 21     return f(*res)

File ~/notebook_env/lib/python3.10/site-packages/ydata_profiling/model/handler.py:17, in compose.<locals>.func.<locals>.func2(*x)
     16 def func2(*x) -> Any:
---> 17     res = g(*x)
     18     if type(res) == bool:
     19         return f(*x)

File ~/notebook_env/lib/python3.10/site-packages/multimethod/__init__.py:315, in multimethod.__call__(self, *args, **kwargs)
    313 func = self[tuple(func(arg) for func, arg in zip(self.type_checkers, args))]
    314 try:
--> 315     return func(*args, **kwargs)
    316 except TypeError as ex:
    317     raise DispatchError(f"Function {func.__code__}") from ex

File ~/notebook_env/lib/python3.10/site-packages/ydata_profiling/model/summary_algorithms.py:65, in series_hashable.<locals>.inner(config, series, summary)
     63 if not summary["hashable"]:
     64     return config, series, summary
---> 65 return fn(config, series, summary)

File ~/notebook_env/lib/python3.10/site-packages/ydata_profiling/model/summary_algorithms.py:82, in series_handle_nulls.<locals>.inner(config, series, summary)
     79 if series.hasnans:
     80     series = series.dropna()
---> 82 return fn(config, series, summary)

File ~/notebook_env/lib/python3.10/site-packages/ydata_profiling/model/pandas/describe_categorical_pandas.py:256, in pandas_describe_categorical_1d(config, series, summary)
    245     summary.update(
    246         histogram_compute(
    247             config,
   (...)
    252         )
    253     )
    255 if config.vars.cat.characters:
--> 256     summary.update(unicode_summary_vc(value_counts))
    258 if config.vars.cat.words:
    259     summary.update(word_summary_vc(value_counts, config.vars.cat.stop_words))

File ~/notebook_env/lib/python3.10/site-packages/ydata_profiling/model/pandas/describe_categorical_pandas.py:58, in unicode_summary_vc(vc)
     56 def unicode_summary_vc(vc: pd.Series) -> dict:
     57     try:
---> 58         from tangled_up_in_unicode import (  # type: ignore
     59             block,
     60             block_abbr,
     61             category,
     62             category_long,
     63             script,
     64         )
     65     except ImportError:
     66         from unicodedata import category as _category  # pylint: disable=import-error

File ~/notebook_env/lib/python3.10/site-packages/tangled_up_in_unicode/__init__.py:1
----> 1 from tangled_up_in_unicode.tangled_up_in_unicode_14_0_0 import (
      2     name,
      3     decimal,
      4     digit,
      5     numeric,
      6     combining,
      7     mirrored,
      8     decomposition,
      9     category,
     10     bidirectional,
     11     east_asian_width,
     12     script,
     13     block,
     14     age,
     15     unidata_version,
     16     combining_long,
     17     category_long,
     18     bidirectional_long,
     19     east_asian_width_long,
     20     script_abbr,
     21     block_abbr,
     22     age_long,
     23     prop_list,
     24     titlecase,
     25     lowercase,
     26     uppercase,
     27 )
     29 __version__ = "0.2.0"
     31 __all__ = [
     32     "name",
     33     "decimal",
   (...)
     57     "__version__",
     58 ]

File ~/notebook_env/lib/python3.10/site-packages/tangled_up_in_unicode/tangled_up_in_unicode_14_0_0.py:21
     19 from tangled_up_in_unicode.u14_0_0_data.derived_age_to_age_start import derived_age_to_age_start
     20 from tangled_up_in_unicode.u14_0_0_data.derived_age_to_age_end import derived_age_to_age_end
---> 21 from tangled_up_in_unicode.u14_0_0_data.unicode_data_to_name_start import unicode_data_to_name_start
     22 from tangled_up_in_unicode.u14_0_0_data.unicode_data_to_category_start import unicode_data_to_category_start
     23 from tangled_up_in_unicode.u14_0_0_data.unicode_data_to_category_end import unicode_data_to_category_end

File ~/notebook_env/lib/python3.10/site-packages/tangled_up_in_unicode/u14_0_0_data/unicode_data_to_name_start.py:77306
      1 unicode_data_to_name_start = {
      2     32: "SPACE",
      3     33: "EXCLAMATION MARK",
      4     34: "QUOTATION MARK",
      5     35: "NUMBER SIGN",
      6     36: "DOLLAR SIGN",
      7     37: "PERCENT SIGN",
      8     38: "AMPERSAND",
      9     39: "APOSTROPHE",
     10     40: "LEFT PARENTHESIS",
     11     41: "RIGHT PARENTHESIS",
     12     42: "ASTERISK",
     13     43: "PLUS SIGN",
.
.
.
143861 }

SystemError: unknown opcode

Please advise. Thank you!

Expected Behaviour

When I ran with Python 3.9.7 I received a normal profile in Jupyter Lab with no memory issues or errors.

Data Description

My dataset is publicly available from https://data.mendeley.com/datasets/6w4tzrs3yw.

Code that reproduces the bug

import numpy as np
import pandas as pd
from ydata_profiling import ProfileReport

df = pd.read_excel('../data/mendeley.xlsx')
test_df = df.head(10)
test_profile = ProfileReport(df)
test_profile

### pandas-profiling version

v3.6.6

### Dependencies

```Text
(notebook_env) pop-osnotebook_env$ pip3 list
Package                  Version
------------------------ -----------
aiofiles                 22.1.0
aiosqlite                0.18.0
altair                   4.2.2
anyio                    3.6.2
argon2-cffi              21.3.0
argon2-cffi-bindings     21.2.0
arrow                    1.2.3
asttokens                2.2.1
attrs                    22.2.0
Babel                    2.12.1
backcall                 0.2.0
beautifulsoup4           4.12.0
bleach                   6.0.0
blinker                  1.5
blis                     0.7.9
cachetools               5.3.0
catalogue                2.0.8
certifi                  2022.12.7
cffi                     1.15.1
charset-normalizer       3.1.0
click                    8.1.3
cloudpickle              2.2.1
comm                     0.1.3
confection               0.0.4
contourpy                1.0.7
cycler                   0.11.0
cymem                    2.0.7
dask                     2023.3.2
debugpy                  1.6.6
decorator                5.1.1
defusedxml               0.7.1
entrypoints              0.4
et-xmlfile               1.1.0
executing                1.2.0
fastjsonschema           2.16.3
fonttools                4.39.2
fqdn                     1.5.1
fsspec                   2023.3.0
gitdb                    4.0.10
GitPython                3.1.31
greenlet                 2.0.2
htmlmin                  0.1.12
idna                     3.4
ImageHash                4.3.1
importlib-metadata       6.1.0
interchange              2021.0.4
ipykernel                6.22.0
ipython                  8.11.0
ipython-genutils         0.2.0
ipywidgets               8.0.5
isoduration              20.11.0
jedi                     0.18.2
Jinja2                   3.1.2
joblib                   1.2.0
json5                    0.9.11
jsonpointer              2.3
jsonschema               4.17.3
jupyter_client           8.1.0
jupyter_core             5.3.0
jupyter-events           0.6.3
jupyter_server           2.5.0
jupyter_server_fileid    0.8.0
jupyter_server_terminals 0.4.4
jupyter_server_ydoc      0.8.0
jupyter-ydoc             0.2.3
jupyterlab               3.6.2
jupyterlab-pygments      0.2.2
jupyterlab_server        2.21.0
jupyterlab-widgets       3.0.6
kiwisolver               1.4.4
langcodes                3.3.0
locket                   1.0.0
markdown-it-py           2.2.0
MarkupSafe               2.1.2
matplotlib               3.6.3
matplotlib-inline        0.1.6
mdurl                    0.1.2
mistune                  2.0.5
monotonic                1.6
multimethod              1.9.1
murmurhash               1.0.9
nbclassic                0.5.3
nbclient                 0.7.2
nbconvert                7.2.10
nbformat                 5.8.0
neo4j                    5.6.0
nest-asyncio             1.5.6
networkx                 3.0
notebook                 6.5.3
notebook_shim            0.2.2
numpy                    1.23.5
openpyxl                 3.1.2
packaging                23.0
pandas                   1.5.3
pandas-profiling         3.6.6
pandocfilters            1.5.0
pansi                    2020.7.3
parso                    0.8.3
partd                    1.3.0
pathy                    0.10.1
patsy                    0.5.3
pexpect                  4.8.0
phik                     0.12.3
pickleshare              0.7.5
Pillow                   9.4.0
pip                      22.0.2
platformdirs             3.2.0
preshed                  3.0.8
prometheus-client        0.16.0
prompt-toolkit           3.0.38
protobuf                 3.20.3
psutil                   5.9.4
ptyprocess               0.7.0
pure-eval                0.2.2
py2neo                   2021.2.3
pyarrow                  11.0.0
pycparser                2.21
pydantic                 1.10.7
pydeck                   0.8.0
Pygments                 2.14.0
Pympler                  1.0.1
pyparsing                3.0.9
pyrsistent               0.19.3
python-dateutil          2.8.2
python-json-logger       2.0.7
pytz                     2023.2
pytz-deprecation-shim    0.1.0.post0
PyWavelets               1.4.1
PyYAML                   6.0
pyzmq                    25.0.2
regex                    2023.3.23
requests                 2.28.2
rfc3339-validator        0.1.4
rfc3986-validator        0.1.1
rich                     13.3.2
scikit-learn             1.2.2
scipy                    1.9.3
seaborn                  0.12.2
semver                   2.13.0
Send2Trash               1.8.0
setuptools               59.6.0
six                      1.16.0
smart-open               6.3.0
smmap                    5.0.0
sniffio                  1.3.0
soupsieve                2.4
spacy                    3.5.1
spacy-legacy             3.0.12
spacy-loggers            1.0.4
SQLAlchemy               2.0.7
srsly                    2.4.6
stack-data               0.6.2
statsmodels              0.13.5
streamlit                1.20.0
tangled-up-in-unicode    0.2.0
terminado                0.17.1
thinc                    8.1.9
threadpoolctl            3.1.0
tinycss2                 1.2.1
toml                     0.10.2
tomli                    2.0.1
toolz                    0.12.0
tornado                  6.2
tqdm                     4.64.1
traitlets                5.9.0
typeguard                2.13.3
typer                    0.7.0
typing_extensions        4.5.0
tzdata                   2023.2
tzlocal                  4.3
uri-template             1.2.0
urllib3                  1.26.15
validators               0.20.0
visions                  0.7.5
wasabi                   1.1.1
watchdog                 3.0.0
wcwidth                  0.2.6
webcolors                1.12
webencodings             0.5.1
websocket-client         1.5.1
wheel                    0.40.0
widgetsnbextension       4.0.6
y-py                     0.5.9
ydata-profiling          4.1.1
ypy-websocket            0.8.2
zipp                     3.15.0


### OS

Pop_OS! 22.04

### Checklist

- [X] There is not yet another bug report for this issue in the [issue tracker](https://github.com/ydataai/pandas-profiling/issues)
- [X] The problem is reproducible from this bug report. [This guide](http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports) can help to craft a minimal bug report.
- [X] The issue has not been resolved by the entries listed under [Common Issues](https://pandas-profiling.ydata.ai/docs/master/pages/support_contrib/common_issues.html).
fabclmnt commented 1 year ago

Hi @cj2001 ,

this error is returned by the tangled_up_in_unicode package which is used by ydata-profiling if requested by the user to calculate compute the statistics for string and categorical data.

I suggest to use ydata-profiling without this package installed, or to report this to the tangle-up-in-unicode package directly. They might not support python 3.10 yet.