lahoramaker / facturas2json

Facturas2json es un programa que te permite extraer datos estructurados a partir de facturas utilizando los modelos Marker y nuExtract.
MIT License
20 stars 8 forks source link

[WinError 32] El proceso no tiene acceso al archivo porque está siendo utilizado por otro proceso: 'C:\\Users\\javih\\AppData\\Local\\Temp\\tmp8n6c7nvc.pdf' #1

Open javi-ei opened 2 months ago

javi-ei commented 2 months ago

Hola, estoy con Windows. No consigo quitarme este error, instalando, reinstalando, cambiando el código y dándole tiempo para que libere el archivo, etc. No he conseguido Nada.

¿Alguna idea de como solucionarlo? Gracias.

Detalles del error, a continuación: PermissionError: [WinError 32] El proceso no tiene acceso al archivo porque está siendo utilizado por otro proceso: 'C:\Users\javih\AppData\Local\Temp\tmp8n6c7nvc.pdf' Traceback: File "C:\Users\javih\AppData\Local\Programs\Python\Python312\Lib\site-packages\streamlit\runtime\scriptrunner\script_runner.py", line 542, in _run_script exec(code, module.dict) File "D:\IA\facturas2json\src\facturas2json.py", line 254, in upload_screen() File "D:\IA\facturas2json\src\facturas2json.py", line 148, in upload_screen st.session_state.markdown_texts = extract_markdown_from_pdfs(uploaded_files) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\IA\facturas2json\src\facturas2json.py", line 120, in extract_markdown_from_pdfs markdown_texts = list(executor.map(extract_text_from_pdf, pdf_files)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\javih\AppData\Local\Programs\Python\Python312\Lib\concurrent\futures_base.py", line 619, in result_iterator yield _result_or_cancel(fs.pop()) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\javih\AppData\Local\Programs\Python\Python312\Lib\concurrent\futures_base.py", line 317, in _result_or_cancel return fut.result(timeout) ^^^^^^^^^^^^^^^^^^^ File "C:\Users\javih\AppData\Local\Programs\Python\Python312\Lib\concurrent\futures_base.py", line 456, in result return self.get_result() ^^^^^^^^^^^^^^^^^^^ File "C:\Users\javih\AppData\Local\Programs\Python\Python312\Lib\concurrent\futures_base.py", line 401, in get_result raise self._exception File "C:\Users\javih\AppData\Local\Programs\Python\Python312\Lib\concurrent\futures\thread.py", line 58, in run result = self.fn(*self.args, **self.kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\IA\facturas2json\src\facturas2json.py", line 49, in extract_text_from_pdf os.unlink(tmp_file_path)

carlosar81 commented 2 months ago

Hola, lo mismo. Estoy con windows y pycharm. PermissionError : [WinError 32] El proceso no tiene acceso al archivo porque está siendo utilizado por otro proceso: 'C:\Users\Carlos\AppData\Local\Temp\tmp_ssy1bo8.pdf' Rastrear: File "C:\py_prueba_1\PYTORCH\pytorch\facturas2json.venv\Lib\site-packages\streamlit\runtime\scriptrunner\script_runner.py", line 589, in _run_script exec(code, module.dict) File "C:\py_prueba_1\PYTORCH\pytorch\facturas2json\src\facturas2json.py", line 254, in upload_screen() File "C:\py_prueba_1\PYTORCH\pytorch\facturas2json\src\facturas2json.py", line 148, in upload_screen st.session_state.markdown_texts = extract_markdown_from_pdfs(uploaded_files) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\py_prueba_1\PYTORCH\pytorch\facturas2json\src\facturas2json.py", line 120, in extract_markdown_from_pdfs markdown_texts = list(executor.map(extract_text_from_pdf, pdf_files)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Carlos\AppData\Local\Programs\Python\Python312\Lib\concurrent\futures_base.py", line 619, in result_iterator yield _result_or_cancel(fs.pop()) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Carlos\AppData\Local\Programs\Python\Python312\Lib\concurrent\futures_base.py", line 317, in _result_or_cancel return fut.result(timeout) ^^^^^^^^^^^^^^^^^^^ File "C:\Users\Carlos\AppData\Local\Programs\Python\Python312\Lib\concurrent\futures_base.py", line 456, in result return self.get_result() ^^^^^^^^^^^^^^^^^^^ File "C:\Users\Carlos\AppData\Local\Programs\Python\Python312\Lib\concurrent\futures_base.py", line 401, in get_result raise self._exception File "C:\Users\Carlos\AppData\Local\Programs\Python\Python312\Lib\concurrent\futures\thread.py", line 58, in run result = self.fn(*self.args, **self.kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\py_prueba_1\PYTORCH\pytorch\facturas2json\src\facturas2json.py", line 49, in extract_text_from_pdf os.unlink(tmp_file_path)

carlosar81 commented 2 months ago

No encotre soulucion asique comente la linea que elimina el archivo temporal; #os.remove(tmp_file_path) De todas formas no pude ejecutarlo porque no tengo gpu y termine resetenado el pc luego de mas de 1 hora. Tambien me paso que tuve que instalar pdf2image y poppler (poppler-24.02.0) porque me daba el ste error:

PDFInfoNotInstalledError : No se puede obtener el recuento de páginas. ¿Está instalado Poppler y en PATH? Rastrear: File "C:\py_prueba_1\PYTORCH\pytorch\facturas2json.venv\Lib\site-packages\streamlit\runtime\scriptrunner\script_runner.py", line 589, in _run_script exec(code, module.dict) File "C:\py_prueba_1\PYTORCH\pytorch\facturas2json\src\facturas2json.py", line 269, in upload_screen() File "C:\py_prueba_1\PYTORCH\pytorch\facturas2json\src\facturas2json.py", line 166, in upload_screen first_processed = asyncio.run(process_pdf(uploaded_files[0], st.session_state.markdown_texts[0])) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Carlos\AppData\Local\Programs\Python\Python312\Lib\asyncio\runners.py", line 194, in run return runner.run(main) ^^^^^^^^^^^^^^^^ File "C:\Users\Carlos\AppData\Local\Programs\Python\Python312\Lib\asyncio\runners.py", line 118, in run return self._loop.run_until_complete(task) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Carlos\AppData\Local\Programs\Python\Python312\Lib\asyncio\base_events.py", line 687, in run_until_complete return future.result() ^^^^^^^^^^^^^^^ File "C:\py_prueba_1\PYTORCH\pytorch\facturas2json\src\facturas2json.py", line 139, in process_pdf pdf_preview = get_pdf_preview(pdf_file) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\py_prueba_1\PYTORCH\pytorch\facturas2json\src\facturas2json.py", line 97, in get_pdf_preview return convert_from_bytes(pdf_file.getvalue(), first_page=1, last_page=1)[0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\py_prueba_1\PYTORCH\pytorch\facturas2json.venv\Lib\site-packages\pdf2image\pdf2image.py", line 359, in convert_from_bytes return convert_from_path( ^^^^^^^^^^^^^^^^^^ File "C:\py_prueba_1\PYTORCH\pytorch\facturas2json.venv\Lib\site-packages\pdf2image\pdf2image.py", line 127, in convert_from_path page_count = pdfinfo_from_path( ^^^^^^^^^^^^^^^^^^ File "C:\py_prueba_1\PYTORCH\pytorch\facturas2json.venv\Lib\site-packages\pdf2image\pdf2image.py", line 607, in pdfinfo_from_path raise PDFInfoNotInstalledError(