Open cagan-elden opened 2 months ago
I found upgrading from NLTK 3.8.1 to 3.9.1 broke my project. I now get errors asking me to:
python -m textblob.download_corpora
Previously you could download textblob corpora on one account and it could be found by another account. This is no longer the case.
Moving back to NLTK 3.8.1 fixed it. I can reproduce the issue by upgrading to 3.9.1 again.
The problem is due the version moving back to the NLTK 3.8.1 can help to rectify the error
To follow up on this, I fixed it by specifying the NLTK data path and telling NLTK where to look like this:
def download_nltk_resources(self):
"""
Downloads required NLTK resources if not already present.
"""
import nltk
import os
# Use the environment variable or fall back to default
nltk_data_path = os.getenv('NLTK_DATA', '/usr/local/share/nltk_data')
# Ensure the directory exists
os.makedirs(nltk_data_path, exist_ok=True)
# Add our path to NLTK's data path
nltk.data.path.insert(0, nltk_data_path)
print(f"Using NLTK data path: {nltk_data_path}")
required_resources = {
'averaged_perceptron_tagger': ('taggers', 'averaged_perceptron_tagger'),
'averaged_perceptron_tagger_eng': ('taggers', 'averaged_perceptron_tagger_eng'),
'punkt': ('tokenizers', 'punkt'),
'punkt_tab': ('tokenizers/punkt_tab', 'english'),
'movie_reviews': ('corpora', 'movie_reviews'),
'brown': ('corpora', 'brown'),
'conll2000': ('corpora', 'conll2000'),
'wordnet': ('corpora', 'wordnet')
}
# Download and verify all resources
for resource, (folder, name) in required_resources.items():
try:
nltk.data.find(f'{folder}/{name}')
except LookupError:
print(f"Downloading {resource}...")
nltk.download(resource, download_dir=nltk_data_path, quiet=True)
with NLTK_DATA specified as an environment variable.
Then do something like this:
try:
# Download resources only once at the start
if not hasattr(TextParser, '_resources_checked'):
self.download_nltk_resources()
TextParser._resources_checked = True
python -m textblob.download_corpora
Although I download the corpora as said in the error message it still does not work. I ain't sure is it because of the NLTK library or not because I've installed that too.