sfu-db / dataprep

Open-source low code data preparation library in python. Collect, clean and visualization your data in python with a few lines of code.
http://dataprep.ai
MIT License
2.01k stars 204 forks source link

use create_report get AttributeError: module 'regex' has no attribute 'Pattern' #705

Open liushuishuibanye opened 2 years ago

liushuishuibanye commented 2 years ago

Python 3.7.9 (default, Aug 31 2020, 12:42:55) [GCC 7.3.0] :: Anaconda, Inc. on linux dataprep 0.3.0

code: from dataprep.eda import create_report

bug: AttributeError Traceback (most recent call last)

in 1 import seaborn as sns ----> 2 from dataprep.eda import create_report 3 4 import matplotlib.pyplot as plt 5 import warnings /opt/mlx_deploy/miniconda3/envs/mlx/lib/python3.7/site-packages/dataprep/eda/__init__.py in 7 from ..utils import is_notebook 8 from .correlation import compute_correlation, plot_correlation, render_correlation ----> 9 from .create_report import create_report 10 from .distribution import compute, plot, render 11 from .dtypes import ( /opt/mlx_deploy/miniconda3/envs/mlx/lib/python3.7/site-packages/dataprep/eda/create_report/__init__.py in 10 11 from ..configs import Config ---> 12 from .formatter import format_report 13 from .report import Report 14 /opt/mlx_deploy/miniconda3/envs/mlx/lib/python3.7/site-packages/dataprep/eda/create_report/formatter.py in 13 from ..correlation import render_correlation 14 from ..correlation.compute.overview import correlation_nxn ---> 15 from ..distribution import render 16 from ..utils import _calc_line_dt 17 from ..distribution.compute.overview import calc_stats /opt/mlx_deploy/miniconda3/envs/mlx/lib/python3.7/site-packages/dataprep/eda/distribution/__init__.py in 12 from ..dtypes_v2 import DTypeDef, LatLong 13 from ...progress_bar import ProgressBar ---> 14 from .compute import compute 15 from .render import render 16 /opt/mlx_deploy/miniconda3/envs/mlx/lib/python3.7/site-packages/dataprep/eda/distribution/compute/__init__.py in 15 from .overview import compute_overview 16 from .trivariate import compute_trivariate ---> 17 from .univariate import compute_univariate 18 19 __all__ = ["compute"] /opt/mlx_deploy/miniconda3/envs/mlx/lib/python3.7/site-packages/dataprep/eda/distribution/compute/univariate.py in 12 import pandas as pd 13 from dask.array.stats import chisquare, kurtosis, skew ---> 14 from nltk.stem import PorterStemmer, WordNetLemmatizer 15 16 from ....assets.english_stopwords import english_stopwords as ess /opt/mlx_deploy/miniconda3/envs/mlx/lib/python3.7/site-packages/nltk/__init__.py in 135 from nltk.grammar import * 136 from nltk.probability import * --> 137 from nltk.text import * 138 from nltk.tree import * 139 from nltk.util import * /opt/mlx_deploy/miniconda3/envs/mlx/lib/python3.7/site-packages/nltk/text.py in 27 from nltk.probability import ConditionalFreqDist as CFD 28 from nltk.probability import FreqDist ---> 29 from nltk.tokenize import sent_tokenize 30 from nltk.util import LazyConcatenation, tokenwrap 31 /opt/mlx_deploy/miniconda3/envs/mlx/lib/python3.7/site-packages/nltk/tokenize/__init__.py in 63 64 from nltk.data import load ---> 65 from nltk.tokenize.casual import TweetTokenizer, casual_tokenize 66 from nltk.tokenize.destructive import NLTKWordTokenizer 67 from nltk.tokenize.legality_principle import LegalitySyllableTokenizer /opt/mlx_deploy/miniconda3/envs/mlx/lib/python3.7/site-packages/nltk/tokenize/casual.py in 270 271 --> 272 class TweetTokenizer: 273 r""" 274 Tokenizer for tweets. /opt/mlx_deploy/miniconda3/envs/mlx/lib/python3.7/site-packages/nltk/tokenize/casual.py in TweetTokenizer() 355 356 @property --> 357 def WORD_RE(self) -> regex.Pattern: 358 """Core TweetTokenizer regex""" 359 # Compiles the regex for this and all future instantiations of TweetTokenizer. AttributeError: module 'regex' has no attribute 'Pattern'
jinglinpeng commented 2 years ago

Hi @liushuishuibanye , seems the error comes from the NLTK package. What's your NLTK version? In my case it's 3.5. Maybe you could try to update or downgrade to 3.5 to see whether it works.

liushuishuibanye commented 2 years ago

i downgrade to 3.5 and get this: ImportError: cannot import name 'py25' from 'nltk.util'

eric-lemesre commented 2 years ago

With NTLK version 3.6.4 I have the same bug. But in V3.6.5 it was solving : https://github.com/nltk/nltk/commit/6428c9288a86658cb3d9a1e91816c4bcf162a6f0. When I try to update NTLK to v3.6.5 regex version don't match :

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
dataprep 0.2.15 requires regex<2021.0.0,>=2020.10.15, but you have regex 2021.11.10 which is incompatible.
Maryam-Gol commented 2 years ago

I've solved the issue by downgrading nltk version to 3.4.5. The initial nltk version was 3.6.4,