techleadhd / chatgpt-retrieval

1.65k stars 786 forks source link

Problem with partition_pdf module #32

Open decsousa opened 11 months ago

decsousa commented 11 months ago

Hello, when I try to run the code the following error is displayed:

Traceback (most recent call last): File "C:\Users\Diego Sousa\Desktop\botchatgpt\botchatgpt\chat02.py", line 35, in index = VectorstoreIndexCreator().from_loaders([loader]) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Diego Sousa\AppData\Local\Programs\Python\Python311\Lib\site-packages\langchain\indexes\vectorstore.py", line 72, in from_loaders docs.extend(loader.load()) ^^^^^^^^^^^^^ File "C:\Users\Diego Sousa\AppData\Local\Programs\Python\Python311\Lib\site-packages\langchain\document_loaders\directory.py", line 137, in load self.load_file(i, p, docs, pbar) File "C:\Users\Diego Sousa\AppData\Local\Programs\Python\Python311\Lib\site-packages\langchain\document_loaders\directory.py", line 94, in load_file raise e File "C:\Users\Diego Sousa\AppData\Local\Programs\Python\Python311\Lib\site-packages\langchain\document_loaders\directory.py", line 88, in load_file sub_docs = self.loader_cls(str(item), self.loader_kwargs).load() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Diego Sousa\AppData\Local\Programs\Python\Python311\Lib\site-packages\langchain\document_loaders\unstructured.py", line 86, in load elements = self._get_elements() ^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Diego Sousa\AppData\Local\Programs\Python\Python311\Lib\site-packages\langchain\document_loaders\unstructured.py", line 171, in _get_elements
return partition(filename=self.file_path,
self.unstructured_kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Diego Sousa\AppData\Local\Programs\Python\Python311\Lib\site-packages\unstructured\partition\auto.py", line 221, in partition elements = partition_pdf( ^^^^^^^^^^^^^ NameError: name 'partition_pdf' is not defined. Did you mean: 'partition_xml'?

has anyone had this same problem?

psujit775 commented 11 months ago

+1

GavinXZhang commented 11 months ago

+1

JayKayNJIT commented 10 months ago

Following

fengmzhu commented 10 months ago

+1

3dylson commented 10 months ago

To make it work I had to:

at the file .../site-packages/unstructured/partition/auto.py

add the line: from unstructured.partition.pdf import partition_pdf

then pip3 install pdf2image pdfminer.six

last if you have macOS, search 'Install Certificates.command' in the finder and open it.

Then do the following steps in the terminal:

python3
import nltk
nltk.download()
bobbyfongprivate commented 10 months ago

Downgrading to version 0.7.12 resolved the problem for me. You can do this by running the following command in your virtual environment:

pip install unstructured==0.7.12

fire115 commented 10 months ago

pip install unstructured==0.7.12 works

Zhi0467 commented 18 hours ago

To make it work I had to:

at the file .../site-packages/unstructured/partition/auto.py

add the line: from unstructured.partition.pdf import partition_pdf

then pip3 install pdf2image pdfminer.six

last if you have macOS, search 'Install Certificates.command' in the finder and open it.

Then do the following steps in the terminal:

python3
import nltk
nltk.download()

I tried this but then I got this error: File "/Users/wangzhi/anaconda3/envs/chat/lib/python3.12/site-packages/langchain_community/document_loaders/unstructured.py", line 168, in _get_elements from unstructured.partition.auto import partition File "/Users/wangzhi/anaconda3/envs/chat/lib/python3.12/site-packages/unstructured/partition/auto.py", line 28, in from unstructured.partition.pdf import partition_pdf File "/Users/wangzhi/anaconda3/envs/chat/lib/python3.12/site-packages/unstructured/partition/pdf.py", line 19, in from pillow_heif import register_heif_opener ModuleNotFoundError: No module named 'pillow_heif'

any ideas please? @3dylson