Describe the bug
4 days ago nltk did a breaking change in the 3.8.2 release. The issue is described here. This causes any applications which depend on llm-guard to crash with the following error:
File "/home/adsp/venv/lib/python3.11/site-packages/llm_guard/evaluate.py", line 51, in scan_prompt
sanitized_prompt, is_valid, risk_score = scanner.scan(sanitized_prompt)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/adsp/venv/lib/python3.11/site-packages/llm_guard/input_scanners/toxicity.py", line 100, in scan
inputs = self._match_type.get_inputs(prompt)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/adsp/venv/lib/python3.11/site-packages/llm_guard/input_scanners/toxicity.py", line 45, in get_inputs
return split_text_by_sentences(prompt)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/adsp/venv/lib/python3.11/site-packages/llm_guard/util.py", line 231, in split_text_by_sentences
return nltk.sent_tokenize(text.strip())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/adsp/venv/lib/python3.11/site-packages/nltk/tokenize/__init__.py", line 106, in sent_tokenize
tokenizer = PunktTokenizer(language)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/adsp/venv/lib/python3.11/site-packages/nltk/tokenize/punkt.py", line 1744, in __init__
self.load_lang(lang)
File "/home/adsp/venv/lib/python3.11/site-packages/nltk/tokenize/punkt.py", line 1749, in load_lang
lang_dir = find(f"tokenizers/punkt_tab/{lang}/")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/adsp/venv/lib/python3.11/site-packages/nltk/data.py", line 582, in find
raise LookupError(resource_not_found)
LookupError:
**********************************************************************
Resource punkt_tab not found.
Please use the NLTK Downloader to obtain the resource:
>>> import nltk
>>> nltk.download('punkt_tab')
For more information see: https://www.nltk.org/data.html
Attempted to load tokenizers/punkt_tab/english/
Searched in:
- '/home/adsp/nltk_data'
- '/home/adsp/venv/nltk_data'
- '/home/adsp/venv/share/nltk_data'
- '/home/adsp/venv/lib/nltk_data'
- '/usr/share/nltk_data'
- '/usr/local/share/nltk_data'
- '/usr/lib/nltk_data'
- '/usr/local/lib/nltk_data'
**********************************************************************
in llm-guard's pyproject.toml filen nltk's version is specified as nltk>=3.8,<4 which is causing my application to install llm-guard with nltk version 3.8.2
I believe a quick patch would be to just pin nltk to version 3.8.1, until a better solution is implemented
To Reproduce
Spin up llm-guard and attempt to use scan_prompt
Expected behavior
The breaking change from nltk should be handled by llm-guard so llm-guard does not break
Thanks everyone! please let me know if you need further information
Describe the bug 4 days ago nltk did a breaking change in the 3.8.2 release. The issue is described here. This causes any applications which depend on llm-guard to crash with the following error:
in llm-guard's
pyproject.toml
filen nltk's version is specified asnltk>=3.8,<4
which is causing my application to install llm-guard with nltk version 3.8.2I believe a quick patch would be to just pin nltk to version 3.8.1, until a better solution is implemented
To Reproduce Spin up llm-guard and attempt to use
scan_prompt
Expected behavior The breaking change from nltk should be handled by llm-guard so llm-guard does not break
Thanks everyone! please let me know if you need further information