Closed rossman22590 closed 8 months ago
/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/118.0.0.0 Safari/537.36" "73.143.175.253" webwhiz-main-python_worker-1 | [2023-10-21 05:00:56,565: INFO/MainProcess] Task worker.extract_pdf_text[c8446b21-fd22-4b9f-be71-694cf3cdf25f] received webwhiz-main-python_worker-1 | [2023-10-21 05:00:56,568: ERROR/ForkPoolWorker-1] Task worker.extract_pdf_text[c8446b21-fd22-4b9f-be71-694cf3cdf25f] raised unexpected: FileNotFoundError("no such file: '/storage1pdfs/2acd5911931b581ff5b5a39984286de6'") webwhiz-main-python_worker-1 | Traceback (most recent call last): webwhiz-main-python_worker-1 | File "/usr/local/lib/python3.8/site-packages/celery/app/trace.py", line 451, in trace_task webwhiz-main-python_worker-1 | R = retval = fun(*args, *kwargs) webwhiz-main-python_worker-1 | File "/usr/local/lib/python3.8/site-packages/celery/app/trace.py", line 734, in __protected_call__ webwhiz-main-python_worker-1 | return self.run(args, **kwargs) webwhiz-main-python_worker-1 | File "/app/worker.py", line 45, in extract_pdf_text webwhiz-main-python_worker-1 | return get_text_from_pdf(knowledgebase_id, pdf_path, max_pages, filename, db) webwhiz-main-python_worker-1 | File "/app/extract_text.py", line 38, in get_text_from_pdf webwhiz-main-python_worker-1 | with fitz.open(pdf_file_path) as doc: webwhiz-main-python_worker-1 | File "/usr/local/lib/python3.8/site-packages/fitz/fitz.py", line 3953, in init webwhiz-main-python_worker-1 | raise FileNotFoundError(msg) webwhiz-main-python_worker-1 | fitz.fitz.FileNotFoundError: no such file: '/storage1pdfs/2acd5911931b581ff5b5a39984286de6' webwhiz-main-web-1 | [Nest] 1 - 10/21/2023, 5:00:57 AM ERROR [ExceptionsHandler] FAILURE webwhiz-main-web-1 | Error: FAILURE webwhiz-main-web-1 | at createError (/node_modules/celery-node/dist/app/result.js:15:19) webwhiz-main-web-1 | at /node_modules/celery-node/dist/app/result.js:77:23 webwhiz-main-web-1 | at process.processTicksAndRejections (node:internal/process/task_queues:95:5) webwhiz-main-web-1 | at async PdfImporterService.addPdfToDataStoreTask (/dist/importers/pdf/pdf-importer.service.js:34:29) webwhiz-main-web-1 | at async PdfImporterService.addPdfToDataStore (/dist/importers/pdf/pdf-importer.service.js:45:9)
is there an API or something or something docker is missing?
how do i make this work?
any ideas?
Here's what's happening:
web
container will download the PDF and pass the destination path to python_worker
.python_worker
and web
do not share a volume through docker.python_worker
tries to access the non-existent file and fails.To fix it:
web
and python_worker
.version: '3'
services:
redis:
image: redis:alpine
expose:
- "6379"
mongodb:
image: mongo:latest
volumes:
- db-data:/data/db
expose:
- "27017"
web:
build: .
command: node dist/main.js
ports:
- "3000:3000"
depends_on:
- redis
- mongodb
volumes:
- uploaded-files:/storage
nodejs_worker:
build: .
command: node dist/crawler.main.js
depends_on:
- redis
- mongodb
python_worker:
build: ./workers
depends_on:
- redis
- mongodb
volumes:
- uploaded-files:/storage
frontend:
build: ./frontend
ports:
- "3030:80"
depends_on:
- web
widget:
build: ./widget
ports:
- "3031:80"
depends_on:
- web
volumes:
db-data:
driver: local
uploaded-files:
driver: local
Thank you I will try this tomorrow,.
I would like to thank you @iagocq for your comment. I had the same problem of not being able to load pdf files and with your answer I managed to solve it!!!
I get errors when i try to upload in docker, anything i need to configure?