sloria / TextBlob

Simple, Pythonic, text processing--Sentiment analysis, part-of-speech tagging, noun phrase extraction, translation, and more.
https://textblob.readthedocs.io/
MIT License
9.11k stars 1.13k forks source link

Error running example.py #450

Open artptz opened 4 months ago

artptz commented 4 months ago
On: 31/08/2017   -2.04
CARD PAYMENT TO SHELL TOTHILL,2.04 GBP, RATE 1.00/GBP ON 29-08-2013
My guess is: 
> 6
Traceback (most recent call last):
  File "/Users/arturo/Documents/GitHub/BankClassify/.venv/lib/python3.12/site-packages/textblob/decorators.py", line 35, in decorated
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/arturo/Documents/GitHub/BankClassify/.venv/lib/python3.12/site-packages/textblob/tokenizers.py", line 59, in tokenize
    return nltk.tokenize.sent_tokenize(text)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/arturo/Documents/GitHub/BankClassify/.venv/lib/python3.12/site-packages/nltk/tokenize/__init__.py", line 106, in sent_tokenize
    tokenizer = load(f"tokenizers/punkt/{language}.pickle")
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/arturo/Documents/GitHub/BankClassify/.venv/lib/python3.12/site-packages/nltk/data.py", line 750, in load
    opened_resource = _open(resource_url)
                      ^^^^^^^^^^^^^^^^^^^
  File "/Users/arturo/Documents/GitHub/BankClassify/.venv/lib/python3.12/site-packages/nltk/data.py", line 876, in _open
    return find(path_, path + [""]).open()
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/arturo/Documents/GitHub/BankClassify/.venv/lib/python3.12/site-packages/nltk/data.py", line 583, in find
    raise LookupError(resource_not_found)
LookupError: 
**********************************************************************
  Resource punkt not found.
  Please use the NLTK Downloader to obtain the resource:

  >>> import nltk
  >>> nltk.download('punkt')

  For more information see: https://www.nltk.org/data.html

  Attempted to load tokenizers/punkt/PY3/english.pickle

  Searched in:
    - '/Users/arturo/nltk_data'
    - '/Users/arturo/Documents/GitHub/BankClassify/.venv/nltk_data'
    - '/Users/arturo/Documents/GitHub/BankClassify/.venv/share/nltk_data'
    - '/Users/arturo/Documents/GitHub/BankClassify/.venv/lib/nltk_data'
    - '/usr/share/nltk_data'
    - '/usr/local/share/nltk_data'
    - '/usr/lib/nltk_data'
    - '/usr/local/lib/nltk_data'
    - ''
**********************************************************************

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/arturo/Documents/GitHub/BankClassify/example.py", line 5, in <module>
    bc.add_data("Statement_Example.txt")
  File "/Users/arturo/Documents/GitHub/BankClassify/BankClassify.py", line 58, in add_data
    self._ask_with_guess(self.new_data)
  File "/Users/arturo/Documents/GitHub/BankClassify/BankClassify.py", line 154, in _ask_with_guess
    self.classifier.update([(stripped_text, category)   ])
  File "/Users/arturo/Documents/GitHub/BankClassify/.venv/lib/python3.12/site-packages/textblob/classifiers.py", line 292, in update
    self._word_set.update(_get_words_from_dataset(new_data))
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/arturo/Documents/GitHub/BankClassify/.venv/lib/python3.12/site-packages/textblob/classifiers.py", line 64, in _get_words_from_dataset
    return set(all_words)
           ^^^^^^^^^^^^^^
  File "/Users/arturo/Documents/GitHub/BankClassify/.venv/lib/python3.12/site-packages/textblob/classifiers.py", line 63, in <genexpr>
    all_words = chain.from_iterable(tokenize(words) for words, _ in dataset)
                                    ^^^^^^^^^^^^^^^
  File "/Users/arturo/Documents/GitHub/BankClassify/.venv/lib/python3.12/site-packages/textblob/classifiers.py", line 59, in tokenize
    return word_tokenize(words, include_punc=False)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/arturo/Documents/GitHub/BankClassify/.venv/lib/python3.12/site-packages/textblob/tokenizers.py", line 76, in word_tokenize
    for sentence in sent_tokenize(text)
                    ^^^^^^^^^^^^^^^^^^^
  File "/Users/arturo/Documents/GitHub/BankClassify/.venv/lib/python3.12/site-packages/textblob/base.py", line 67, in itokenize
    return (t for t in self.tokenize(text, *args, **kwargs))
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/arturo/Documents/GitHub/BankClassify/.venv/lib/python3.12/site-packages/textblob/decorators.py", line 37, in decorated
    raise MissingCorpusError() from error
textblob.exceptions.MissingCorpusError: 
Looks like you are missing some required data for this feature.

To download the necessary data, simply run

    python -m textblob.download_corpora

or use the NLTK downloader to download the missing data: http://nltk.org/data.html
If this doesn't fix the problem, file an issue at https://github.com/sloria/TextBlob/issues.

Process finished with exit code 1

I ran python -m textblob.download_corpora but still received the above error

396449673 commented 1 month ago

I have the same error

alvindera97 commented 1 month ago

Same error.. Wondering if there's something I could be missing... I'm trying to run the example from a virtual environment though......

Noticed it's been a while (about 6 months before time of writing -> https://github.com/sloria/TextBlob/commit/c27324d9986fdfa56d4337c3bce952f2b057ceb4) since there were changes to the repo and I could roughly say that this project isn't really maintained anymore or the author(s) / contributor(s) haven't exactly had the time to address some issues as of late.

Still hope to hear from them whenever someone's available.

elifbeyzatok00 commented 1 month ago

I have the same error too:

sample_text = "I love data science and machine learning. I love coding. I love data science and coding."
TextBlob(sample_text).ngrams(3) # 3-gram

LookupError                               Traceback (most recent call last)
File c:\Users\tokel\anaconda3\Lib\site-packages\textblob\decorators.py:35, in requires_nltk_corpus.<locals>.decorated(*args, **kwargs)
     [34](file:///C:/Users/tokel/anaconda3/Lib/site-packages/textblob/decorators.py:34) try:
---> [35](file:///C:/Users/tokel/anaconda3/Lib/site-packages/textblob/decorators.py:35)     return func(*args, **kwargs)
     [36](file:///C:/Users/tokel/anaconda3/Lib/site-packages/textblob/decorators.py:36) except LookupError as error:

File c:\Users\tokel\anaconda3\Lib\site-packages\textblob\tokenizers.py:59, in SentenceTokenizer.tokenize(self, text)
     [58](file:///C:/Users/tokel/anaconda3/Lib/site-packages/textblob/tokenizers.py:58) """Return a list of sentences."""
---> [59](file:///C:/Users/tokel/anaconda3/Lib/site-packages/textblob/tokenizers.py:59) return nltk.tokenize.sent_tokenize(text)

File c:\Users\tokel\anaconda3\Lib\site-packages\nltk\tokenize\__init__.py:119, in sent_tokenize(text, language)
    [110](file:///C:/Users/tokel/anaconda3/Lib/site-packages/nltk/tokenize/__init__.py:110) """
    [111](file:///C:/Users/tokel/anaconda3/Lib/site-packages/nltk/tokenize/__init__.py:111) Return a sentence-tokenized copy of *text*,
    [112](file:///C:/Users/tokel/anaconda3/Lib/site-packages/nltk/tokenize/__init__.py:112) using NLTK's recommended sentence tokenizer
   (...)
    [117](file:///C:/Users/tokel/anaconda3/Lib/site-packages/nltk/tokenize/__init__.py:117) :param language: the model name in the Punkt corpus
    [118](file:///C:/Users/tokel/anaconda3/Lib/site-packages/nltk/tokenize/__init__.py:118) """
--> [119](file:///C:/Users/tokel/anaconda3/Lib/site-packages/nltk/tokenize/__init__.py:119) tokenizer = _get_punkt_tokenizer(language)
    [120](file:///C:/Users/tokel/anaconda3/Lib/site-packages/nltk/tokenize/__init__.py:120) return tokenizer.tokenize(text)

File c:\Users\tokel\anaconda3\Lib\site-packages\nltk\tokenize\__init__.py:105, in _get_punkt_tokenizer(language)
     [98](file:///C:/Users/tokel/anaconda3/Lib/site-packages/nltk/tokenize/__init__.py:98) """
     [99](file:///C:/Users/tokel/anaconda3/Lib/site-packages/nltk/tokenize/__init__.py:99) A constructor for the PunktTokenizer that utilizes
    [100](file:///C:/Users/tokel/anaconda3/Lib/site-packages/nltk/tokenize/__init__.py:100) a lru cache for performance.
...
    python -m textblob.download_corpora

or use the NLTK downloader to download the missing data: http://nltk.org/data.html
If this doesn't fix the problem, file an issue at https://github.com/sloria/TextBlob/issues.