tonykipkemboi / ollama_pdf_rag

A demo Jupyter Notebook showcasing a simple local RAG (Retrieval Augmented Generation) pipeline to chat with your PDFs.
MIT License
151 stars 71 forks source link

PermissionError: [Errno 13] Permission denied: #11

Open ziayounasch opened 1 month ago

ziayounasch commented 1 month ago

Hi, When I upload a pdf file it gives the following error instead of creating embeddings. I also tried installing poppler by using pip command but not succeeded. I am trying this on Windows 11. Can you please guide I will really appreciate it.

Screenshot 2024-07-10 130301

`2024-07-10 12:54:50 - INFO - HTTP Request: GET http://127.0.0.1:11434/api/tags "HTTP/1.1 200 OK" 2024-07-10 12:54:50 - INFO - Extracting model names from models_info 2024-07-10 12:54:50 - INFO - Extracted model names: ('nomic-embed-text:latest', 'phi3:14b-medium-128k-instruct-q4_0', 'mistral:latest', 'mixtral:8x7b', 'llama3:latest') 2024-07-10 12:54:56 - INFO - HTTP Request: GET http://127.0.0.1:11434/api/tags "HTTP/1.1 200 OK" 2024-07-10 12:54:56 - INFO - Creating vector DB from file upload: WEF_The_Global_Cooperation_Barometer_2024.pdf 2024-07-10 12:54:56 - INFO - File saved to temporary path: C:\Users\Ziach\AppData\Local\Temp\tmp7mzshwdt\WEF_The_Global_Cooperation_Barometer_2024.pdf 2024-07-10 12:55:43 - INFO - pikepdf C++ to Python logger bridge initialized 2024-07-10 12:55:48 - INFO - PDF text extraction failed, skip text extraction... 2024-07-10 12:55:48.224 Uncaught app exception Traceback (most recent call last): File "C:\Users\Ziach\anaconda3\envs\ollamarag\lib\site-packages\pdf2image\pdf2image.py", line 581, in pdfinfo_from_path proc = Popen(command, env=env, stdout=PIPE, stderr=PIPE) File "C:\Users\Ziach\anaconda3\envs\ollamarag\lib\subprocess.py", line 971, in init self._execute_child(args, executable, preexec_fn, close_fds, File "C:\Users\Ziach\anaconda3\envs\ollamarag\lib\subprocess.py", line 1456, in _execute_child hp, ht, pid, tid = _winapi.CreateProcess(executable, args, FileNotFoundError: [WinError 2] The system cannot find the file specified

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "C:\Users\Ziach\anaconda3\envs\ollamarag\lib\site-packages\streamlit\runtime\scriptrunner\script_runner.py", line 589, in _run_script exec(code, module.dict) File "C:\Users\Ziach\Documents\Python Apps\RAG Ollama\ollama_pdf_rag-main\streamlit_app.py", line 278, in main() File "C:\Users\Ziach\Documents\Python Apps\RAG Ollama\ollama_pdf_rag-main\streamlit_app.py", line 223, in main st.session_state["vector_db"] = create_vector_db(file_upload) File "C:\Users\Ziach\Documents\Python Apps\RAG Ollama\ollama_pdf_rag-main\streamlit_app.py", line 82, in create_vector_db data = loader.load() File "C:\Users\Ziach\anaconda3\envs\ollamarag\lib\site-packages\langchain_core\document_loaders\base.py", line 30, in load return list(self.lazy_load()) File "C:\Users\Ziach\anaconda3\envs\ollamarag\lib\site-packages\langchain_community\document_loaders\unstructured.py", line 89, in lazy_load elements = self._get_elements() File "C:\Users\Ziach\anaconda3\envs\ollamarag\lib\site-packages\langchain_community\document_loaders\pdf.py", line 73, in _get_elements return partition_pdf(filename=self.file_path, self.unstructured_kwargs) File "C:\Users\Ziach\anaconda3\envs\ollamarag\lib\site-packages\unstructured\documents\elements.py", line 593, in wrapper elements = func(*args, *kwargs) File "C:\Users\Ziach\anaconda3\envs\ollamarag\lib\site-packages\unstructured\file_utils\filetype.py", line 626, in wrapper elements = func(args, kwargs) File "C:\Users\Ziach\anaconda3\envs\ollamarag\lib\site-packages\unstructured\file_utils\filetype.py", line 582, in wrapper elements = func(*args, *kwargs) File "C:\Users\Ziach\anaconda3\envs\ollamarag\lib\site-packages\unstructured\chunking\dispatch.py", line 74, in wrapper elements = func(args, **kwargs) File "C:\Users\Ziach\anaconda3\envs\ollamarag\lib\site-packages\unstructured\partition\pdf.py", line 202, in partition_pdf return partition_pdf_or_image( File "C:\Users\Ziach\anaconda3\envs\ollamarag\lib\site-packages\unstructured\partition\pdf.py", line 331, in partition_pdf_or_image elements = _partition_pdf_or_image_with_ocr( File "C:\Users\Ziach\anaconda3\envs\ollamarag\lib\site-packages\unstructured\partition\pdf.py", line 848, in _partition_pdf_or_image_with_ocr for page_number, image in enumerate( File "C:\Users\Ziach\anaconda3\envs\ollamarag\lib\site-packages\unstructured\partition\pdf_image\pdf_image_utils.py", line 395, in convert_pdf_to_images info = pdf2image.pdfinfo_from_path(filename) File "C:\Users\Ziach\anaconda3\envs\ollamarag\lib\site-packages\pdf2image\pdf2image.py", line 607, in pdfinfo_from_path raise PDFInfoNotInstalledError( pdf2image.exceptions.PDFInfoNotInstalledError: Unable to get page count. Is poppler installed and in PATH?`

ziayounasch commented 1 month ago

I have figured out the above mentioned issue and it is resolved but now I got stuck into a new error which is as under: Can you please guide I will really appreciate it. 2024-07-11 12:57:04 - INFO - HTTP Request: GET http://127.0.0.1:11434/api/tags "HTTP/1.1 200 OK" 2024-07-11 12:57:04 - INFO - Extracting model names from models_info 2024-07-11 12:57:04 - INFO - Extracted model names: ('nomic-embed-text:latest', 'phi3:14b-medium-128k-instruct-q4_0', 'mistral:latest', 'mixtral:8x7b', 'llama3:latest') 2024-07-11 12:57:11 - INFO - HTTP Request: GET http://127.0.0.1:11434/api/tags "HTTP/1.1 200 OK" 2024-07-11 12:57:11 - INFO - Creating vector DB from file upload: WEF_The_Global_Cooperation_Barometer_2024.pdf 2024-07-11 12:57:11 - INFO - File saved to temporary path: C:\Users\Ziach\AppData\Local\Temp\tmp1acgx7_2\WEF_The_Global_Cooperation_Barometer_2024.pdf 2024-07-11 12:57:22 - INFO - pikepdf C++ to Python logger bridge initialized 2024-07-11 12:57:25 - INFO - PDF text extraction failed, skip text extraction... 2024-07-11 12:58:52.470 Uncaught app exception Traceback (most recent call last): File "C:\Users\Ziach\anaconda3\envs\OLLAMARAG\Lib\site-packages\streamlit\runtime\scriptrunner\script_runner.py", line 589, in _run_script exec(code, module.__dict__) File "C:\Users\Ziach\Documents\Python Apps\RAG Ollama\ollama_pdf_rag-main\streamlit_app.py", line 278, in <module> main() File "C:\Users\Ziach\Documents\Python Apps\RAG Ollama\ollama_pdf_rag-main\streamlit_app.py", line 223, in main st.session_state["vector_db"] = create_vector_db(file_upload) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Ziach\Documents\Python Apps\RAG Ollama\ollama_pdf_rag-main\streamlit_app.py", line 82, in create_vector_db data = loader.load() ^^^^^^^^^^^^^ File "C:\Users\Ziach\anaconda3\envs\OLLAMARAG\Lib\site-packages\langchain_core\document_loaders\base.py", line 30, in load return list(self.lazy_load()) ^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Ziach\anaconda3\envs\OLLAMARAG\Lib\site-packages\langchain_community\document_loaders\unstructured.py", line 89, in lazy_load elements = self._get_elements() ^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Ziach\anaconda3\envs\OLLAMARAG\Lib\site-packages\langchain_community\document_loaders\pdf.py", line 73, in _get_elements return partition_pdf(filename=self.file_path, **self.unstructured_kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Ziach\anaconda3\envs\OLLAMARAG\Lib\site-packages\unstructured\documents\elements.py", line 593, in wrapper elements = func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Ziach\anaconda3\envs\OLLAMARAG\Lib\site-packages\unstructured\file_utils\filetype.py", line 626, in wrapper elements = func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Ziach\anaconda3\envs\OLLAMARAG\Lib\site-packages\unstructured\file_utils\filetype.py", line 582, in wrapper elements = func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Ziach\anaconda3\envs\OLLAMARAG\Lib\site-packages\unstructured\chunking\dispatch.py", line 74, in wrapper elements = func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Ziach\anaconda3\envs\OLLAMARAG\Lib\site-packages\unstructured\partition\pdf.py", line 202, in partition_pdf return partition_pdf_or_image( ^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Ziach\anaconda3\envs\OLLAMARAG\Lib\site-packages\unstructured\partition\pdf.py", line 341, in partition_pdf_or_image out_elements = _process_uncategorized_text_elements(elements) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Ziach\anaconda3\envs\OLLAMARAG\Lib\site-packages\unstructured\partition\pdf.py", line 920, in _process_uncategorized_text_elements new_el = element_from_text(cast(Text, el).text) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Ziach\anaconda3\envs\OLLAMARAG\Lib\site-packages\unstructured\partition\text.py", line 294, in element_from_text elif is_possible_narrative_text(text): ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Ziach\anaconda3\envs\OLLAMARAG\Lib\site-packages\unstructured\partition\text_type.py", line 80, in is_possible_narrative_text if exceeds_cap_ratio(text, threshold=cap_threshold): ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Ziach\anaconda3\envs\OLLAMARAG\Lib\site-packages\unstructured\partition\text_type.py", line 276, in exceeds_cap_ratio if sentence_count(text, 3) > 1: ^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Ziach\anaconda3\envs\OLLAMARAG\Lib\site-packages\unstructured\partition\text_type.py", line 225, in sentence_count sentences = sent_tokenize(text) ^^^^^^^^^^^^^^^^^^^ File "C:\Users\Ziach\anaconda3\envs\OLLAMARAG\Lib\site-packages\unstructured\nlp\tokenize.py", line 136, in sent_tokenize _download_nltk_packages_if_not_present() File "C:\Users\Ziach\anaconda3\envs\OLLAMARAG\Lib\site-packages\unstructured\nlp\tokenize.py", line 130, in _download_nltk_packages_if_not_present download_nltk_packages() File "C:\Users\Ziach\anaconda3\envs\OLLAMARAG\Lib\site-packages\unstructured\nlp\tokenize.py", line 88, in download_nltk_packages urllib.request.urlretrieve(NLTK_DATA_URL, tgz_file) File "C:\Users\Ziach\anaconda3\envs\OLLAMARAG\Lib\urllib\request.py", line 250, in urlretrieve tfp = open(filename, 'wb') ^^^^^^^^^^^^^^^^^^^^ PermissionError: [Errno 13] Permission denied: 'C:\\Users\\Ziach\\AppData\\Local\\Temp\\tmpgxd_0ag7'

Screenshot of permission denied
tonykipkemboi commented 1 month ago

@ziayounasch, so it seems you don't have write permission to the temp directory C:\Users\Ziach\AppData\Local\Temp for you to save the file temporarily. You can confirm this by trying to create and delete a file manually in this same directory.

One potential solution to try is running your application as an administrator. Right-click on your Python IDE or command prompt and select "Run as administrator". Let me know if this resolves the issue.

ziayounasch commented 1 month ago

I have run command prompt as "Run as Administrator" and it gave me the same error. I also checked the temp directory, I can create and delete file there. When I look at the log while running the app it stops at the following point: 2024-07-11 19:45:43 - INFO - PDF text extraction failed, skip text extraction... After this it throws the error. One more thing I have changed the Username directory in my laptop from some other name to Ziach because of which vs code started giving me error so then I uninstall and reinstall the vs code it started working fine. This time the following is the traceback:

`C:\Windows\System32>cd C:\Users\Ziach\Documents\Python Apps\RAG Ollama\ollama_pdf_rag-main

C:\Users\Ziach\Documents\Python Apps\RAG Ollama\ollama_pdf_rag-main>streamlit run streamlit_app.py 'streamlit' is not recognized as an internal or external command, operable program or batch file.

C:\Users\Ziach\Documents\Python Apps\RAG Ollama\ollama_pdf_rag-main>conda activate OLLAMARAG

(OLLAMARAG) C:\Users\Ziach\Documents\Python Apps\RAG Ollama\ollama_pdf_rag-main>streamlit run streamlit_app.py

You can now view your Streamlit app in your browser.

Local URL: http://localhost:8501 Network URL: http://192.168.0.4:8501

2024-07-11 19:45:20 - INFO - HTTP Request: GET http://127.0.0.1:11434/api/tags "HTTP/1.1 200 OK" 2024-07-11 19:45:20 - INFO - Extracting model names from models_info 2024-07-11 19:45:20 - INFO - Extracted model names: ('nomic-embed-text:latest', 'phi3:14b-medium-128k-instruct-q4_0', 'mistral:latest', 'mixtral:8x7b', 'llama3:latest') 2024-07-11 19:45:30 - INFO - HTTP Request: GET http://127.0.0.1:11434/api/tags "HTTP/1.1 200 OK" 2024-07-11 19:45:30 - INFO - Creating vector DB from file upload: WEF_The_Global_Cooperation_Barometer_2024.pdf 2024-07-11 19:45:30 - INFO - File saved to temporary path: C:\Users\Ziach\AppData\Local\Temp\tmp_ez9r60m\WEF_The_Global_Cooperation_Barometer_2024.pdf 2024-07-11 19:45:41 - INFO - pikepdf C++ to Python logger bridge initialized 2024-07-11 19:45:43 - INFO - PDF text extraction failed, skip text extraction... 2024-07-11 19:47:31.549 Uncaught app exception Traceback (most recent call last): File "C:\Users\Ziach\anaconda3\envs\OLLAMARAG\Lib\site-packages\streamlit\runtime\scriptrunner\script_runner.py", line 589, in _run_script exec(code, module.dict) File "C:\Users\Ziach\Documents\Python Apps\RAG Ollama\ollama_pdf_rag-main\streamlit_app.py", line 278, in main() File "C:\Users\Ziach\Documents\Python Apps\RAG Ollama\ollama_pdf_rag-main\streamlit_app.py", line 223, in main st.session_state["vector_db"] = create_vector_db(file_upload) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Ziach\Documents\Python Apps\RAG Ollama\ollama_pdf_rag-main\streamlit_app.py", line 82, in create_vector_db data = loader.load() ^^^^^^^^^^^^^ File "C:\Users\Ziach\anaconda3\envs\OLLAMARAG\Lib\site-packages\langchain_core\document_loaders\base.py", line 30, in load return list(self.lazy_load()) ^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Ziach\anaconda3\envs\OLLAMARAG\Lib\site-packages\langchain_community\document_loaders\unstructured.py", line 89, in lazy_load elements = self._get_elements() ^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Ziach\anaconda3\envs\OLLAMARAG\Lib\site-packages\langchain_community\document_loaders\pdf.py", line 73, in _get_elements return partition_pdf(filename=self.file_path, self.unstructured_kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Ziach\anaconda3\envs\OLLAMARAG\Lib\site-packages\unstructured\documents\elements.py", line 593, in wrapper elements = func(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Ziach\anaconda3\envs\OLLAMARAG\Lib\site-packages\unstructured\file_utils\filetype.py", line 626, in wrapper elements = func(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Ziach\anaconda3\envs\OLLAMARAG\Lib\site-packages\unstructured\file_utils\filetype.py", line 582, in wrapper elements = func(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Ziach\anaconda3\envs\OLLAMARAG\Lib\site-packages\unstructured\chunking\dispatch.py", line 74, in wrapper elements = func(args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Ziach\anaconda3\envs\OLLAMARAG\Lib\site-packages\unstructured\partition\pdf.py", line 202, in partition_pdf return partition_pdf_or_image( ^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Ziach\anaconda3\envs\OLLAMARAG\Lib\site-packages\unstructured\partition\pdf.py", line 341, in partition_pdf_or_image out_elements = _process_uncategorized_text_elements(elements) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Ziach\anaconda3\envs\OLLAMARAG\Lib\site-packages\unstructured\partition\pdf.py", line 920, in _process_uncategorized_text_elements new_el = element_from_text(cast(Text, el).text) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Ziach\anaconda3\envs\OLLAMARAG\Lib\site-packages\unstructured\partition\text.py", line 294, in element_from_text elif is_possible_narrative_text(text): ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Ziach\anaconda3\envs\OLLAMARAG\Lib\site-packages\unstructured\partition\text_type.py", line 80, in is_possible_narrative_text if exceeds_cap_ratio(text, threshold=cap_threshold): ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Ziach\anaconda3\envs\OLLAMARAG\Lib\site-packages\unstructured\partition\text_type.py", line 276, in exceeds_cap_ratio if sentence_count(text, 3) > 1: ^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Ziach\anaconda3\envs\OLLAMARAG\Lib\site-packages\unstructured\partition\text_type.py", line 225, in sentence_count sentences = sent_tokenize(text) ^^^^^^^^^^^^^^^^^^^ File "C:\Users\Ziach\anaconda3\envs\OLLAMARAG\Lib\site-packages\unstructured\nlp\tokenize.py", line 136, in sent_tokenize _download_nltk_packages_if_not_present() File "C:\Users\Ziach\anaconda3\envs\OLLAMARAG\Lib\site-packages\unstructured\nlp\tokenize.py", line 130, in _download_nltk_packages_if_not_present download_nltk_packages() File "C:\Users\Ziach\anaconda3\envs\OLLAMARAG\Lib\site-packages\unstructured\nlp\tokenize.py", line 88, in download_nltk_packages urllib.request.urlretrieve(NLTK_DATA_URL, tgz_file) File "C:\Users\Ziach\anaconda3\envs\OLLAMARAG\Lib\urllib\request.py", line 250, in urlretrieve tfp = open(filename, 'wb') ^^^^^^^^^^^^^^^^^^^^ PermissionError: [Errno 13] Permission denied: 'C:\Users\Ziach\AppData\Local\Temp\tmpki740nav'`

ziayounasch commented 1 month ago

I was looking for such an app from very long time finally found one but it isn't running on my system... Can you please guide and resolve the issue I will really appreciate it.... Thanks!