langchain-ai / langchain

šŸ¦œšŸ”— Build context-aware reasoning applications
https://python.langchain.com
MIT License
92.29k stars 14.75k forks source link

Langchain document loader giving "Resource punkt_tab not found" error #25609

Open quasarswastik opened 3 weeks ago

quasarswastik commented 3 weeks ago

Checked other resources

Example Code


loader = AzureBlobStorageFileLoader(
    conn_str=conn_str,
    container=container,
    blob_name=blob,
)
 document = loader.load()```

### Error Message and Stack Trace (if applicable)

_No response_

### Description

- I am trying to use Langchain to load the documents using `AzureBlobStorageFileLoader`. 
- When loading the document I get an error related to nltk that seems upstream to langchain 
- I could fix the problem temporarily by using a downgraded version of nltk. `nltk == 3.8.1`
![image](https://github.com/user-attachments/assets/f803ddc6-ecae-4b45-8c62-e35016cccc41)

### System Info

langchain==0.2.12
langchain-community==0.2.11
langchain-core==0.2.29
langchain-experimental==0.0.36
langchain-text-splitters==0.2.2

Platform: Ubuntu WSL2 on Windows 10
Containerisation: Docker version 27.0.2, build 912c1dd
Python: Python 3.10.12
quasarswastik commented 3 weeks ago
**********************************************************************
Resource punkt_tab not found.
Please use the NLTK Downloader to obtain the resource:

>>> import nltk
>>> nltk.download('punkt_tab')

For more information see: https://www.nltk.org/data.html

Attempted to load tokenizers/punkt_tab/english/

Searched in:
- '/home/myLowPrivilegeUser/nltk_data'
- '/venv/nltk_data'
- '/venv/share/nltk_data'
- '/venv/lib/nltk_data'
- '/usr/share/nltk_data'
- '/usr/local/share/nltk_data'
- '/usr/lib/nltk_data'
- '/usr/local/lib/nltk_data'
**********************************************************************

image

dvdalilue commented 3 weeks ago

@quasarswastik I was getting the same error, you need to update unstructured and probably python-pptx

pip install unstructured==0.15.7 python-pptx==1.0.2
IMK-Stefan commented 2 days ago

Doesn't solve the issue for me

@quasarswastik I was getting the same error, you need to update unstructured and probably python-pptx

pip install unstructured==0.15.7 python-pptx==1.0.2