sfu-db / dataprep

Open-source low code data preparation library in python. Collect, clean and visualization your data in python with a few lines of code.
http://dataprep.ai
MIT License
2.01k stars 204 forks source link

NumExpr #770

Closed bbarclay closed 2 years ago

bbarclay commented 2 years ago

Getting this bug. Have tried in multiple situations.

Note: NumExpr detected 16 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8. NumExpr defaulting to 8 threads. Traceback (most recent call last): File "", line 1, in File "/Users/brandon/opt/anaconda3/lib/python3.9/site-packages/dataprep/eda/init.py", line 8, in from .create_report import create_report File "/Users/brandon/opt/anaconda3/lib/python3.9/site-packages/dataprep/eda/create_report/init.py", line 10, in from .formatter import format_report File "/Users/brandon/opt/anaconda3/lib/python3.9/site-packages/dataprep/eda/create_report/formatter.py", line 17, in from ..distribution import render File "/Users/brandon/opt/anaconda3/lib/python3.9/site-packages/dataprep/eda/distribution/init.py", line 13, in from .compute import compute File "/Users/brandon/opt/anaconda3/lib/python3.9/site-packages/dataprep/eda/distribution/compute/init.py", line 14, in from .univariate import compute_univariate File "/Users/brandon/opt/anaconda3/lib/python3.9/site-packages/dataprep/eda/distribution/compute/univariate.py", line 11, in from nltk.stem import PorterStemmer, WordNetLemmatizer File "/Users/brandon/opt/anaconda3/lib/python3.9/site-packages/nltk/init.py", line 137, in from nltk.text import * File "/Users/brandon/opt/anaconda3/lib/python3.9/site-packages/nltk/text.py", line 29, in from nltk.tokenize import sent_tokenize File "/Users/brandon/opt/anaconda3/lib/python3.9/site-packages/nltk/tokenize/init.py", line 65, in from nltk.tokenize.casual import TweetTokenizer, casual_tokenize File "/Users/brandon/opt/anaconda3/lib/python3.9/site-packages/nltk/tokenize/casual.py", line 272, in class TweetTokenizer: File "/Users/brandon/opt/anaconda3/lib/python3.9/site-packages/nltk/tokenize/casual.py", line 357, in TweetTokenizer def WORD_RE(self) -> regex.Pattern: AttributeError: module 'regex' has no attribute 'Pattern'

moreaupascal56 commented 2 years ago

Hello, wouldn't it be the same error as here : https://stackoverflow.com/questions/69405949/attributeerror-module-regex-has-no-attribute-pattern and an issue with nltk? Regards 😄

jinglinpeng commented 2 years ago

Hi @moreaupascal56 , thanks for the replying. Hi @bbarclay , thanks for the issue. It looks an NLTK version issue as discussed in #705 . Could you try a lower version of NLTK?