webwhiz-ai / webwhiz

WebWhiz allows you to create an AI chatbot that knows everything about your product and can instantly respond to your customer's queries.
https://www.webwhiz.ai/
GNU Affero General Public License v3.0
918 stars 153 forks source link

Upload PDF #111

Closed rossman22590 closed 8 months ago

rossman22590 commented 11 months ago

I get errors when i try to upload in docker, anything i need to configure?

rossman22590 commented 11 months ago

/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/118.0.0.0 Safari/537.36" "73.143.175.253" webwhiz-main-python_worker-1 | [2023-10-21 05:00:56,565: INFO/MainProcess] Task worker.extract_pdf_text[c8446b21-fd22-4b9f-be71-694cf3cdf25f] received webwhiz-main-python_worker-1 | [2023-10-21 05:00:56,568: ERROR/ForkPoolWorker-1] Task worker.extract_pdf_text[c8446b21-fd22-4b9f-be71-694cf3cdf25f] raised unexpected: FileNotFoundError("no such file: '/storage1pdfs/2acd5911931b581ff5b5a39984286de6'") webwhiz-main-python_worker-1 | Traceback (most recent call last): webwhiz-main-python_worker-1 | File "/usr/local/lib/python3.8/site-packages/celery/app/trace.py", line 451, in trace_task webwhiz-main-python_worker-1 | R = retval = fun(*args, *kwargs) webwhiz-main-python_worker-1 | File "/usr/local/lib/python3.8/site-packages/celery/app/trace.py", line 734, in __protected_call__ webwhiz-main-python_worker-1 | return self.run(args, **kwargs) webwhiz-main-python_worker-1 | File "/app/worker.py", line 45, in extract_pdf_text webwhiz-main-python_worker-1 | return get_text_from_pdf(knowledgebase_id, pdf_path, max_pages, filename, db) webwhiz-main-python_worker-1 | File "/app/extract_text.py", line 38, in get_text_from_pdf webwhiz-main-python_worker-1 | with fitz.open(pdf_file_path) as doc: webwhiz-main-python_worker-1 | File "/usr/local/lib/python3.8/site-packages/fitz/fitz.py", line 3953, in init webwhiz-main-python_worker-1 | raise FileNotFoundError(msg) webwhiz-main-python_worker-1 | fitz.fitz.FileNotFoundError: no such file: '/storage1pdfs/2acd5911931b581ff5b5a39984286de6' webwhiz-main-web-1 | [Nest] 1 - 10/21/2023, 5:00:57 AM ERROR [ExceptionsHandler] FAILURE webwhiz-main-web-1 | Error: FAILURE webwhiz-main-web-1 | at createError (/node_modules/celery-node/dist/app/result.js:15:19) webwhiz-main-web-1 | at /node_modules/celery-node/dist/app/result.js:77:23 webwhiz-main-web-1 | at process.processTicksAndRejections (node:internal/process/task_queues:95:5) webwhiz-main-web-1 | at async PdfImporterService.addPdfToDataStoreTask (/dist/importers/pdf/pdf-importer.service.js:34:29) webwhiz-main-web-1 | at async PdfImporterService.addPdfToDataStore (/dist/importers/pdf/pdf-importer.service.js:45:9)

rossman22590 commented 11 months ago

is there an API or something or something docker is missing?

rossman22590 commented 11 months ago

how do i make this work?

rossman22590 commented 10 months ago

any ideas?

iagocq commented 10 months ago

Here's what's happening:

To fix it:

  1. On .env.docker, change DOC_STORAGE_LOCATION=./storage1 to DOC_STORAGE_LOCATION=/storage/
  2. Replace docker-compose.yml with the following content. It adds a shared volume for web and python_worker.
version: '3'

services:
  redis:
    image: redis:alpine
    expose:
      - "6379"

  mongodb:
    image: mongo:latest
    volumes:
      - db-data:/data/db
    expose:
      - "27017"

  web:
    build: .
    command: node dist/main.js
    ports:
      - "3000:3000"
    depends_on:
      - redis
      - mongodb
    volumes:
      - uploaded-files:/storage

  nodejs_worker:
    build: .
    command: node dist/crawler.main.js
    depends_on:
      - redis
      - mongodb

  python_worker:
    build: ./workers
    depends_on:
      - redis
      - mongodb
    volumes:
      - uploaded-files:/storage

  frontend:
    build: ./frontend
    ports:
      - "3030:80"
    depends_on:
      - web

  widget:
    build: ./widget
    ports:
      - "3031:80"
    depends_on:
      - web

volumes:
  db-data:
    driver: local
  uploaded-files:
    driver: local
rossman22590 commented 9 months ago

Thank you I will try this tomorrow,.

ConsulIam commented 2 months ago

I would like to thank you @iagocq for your comment. I had the same problem of not being able to load pdf files and with your answer I managed to solve it!!!