kuwaai / genai-os

Kuwa GenAI OS: An open, free, secure, and privacy-focused Generative-AI Orchestrating System.
https://kuwaai.tw/os/intro
MIT License
89 stars 21 forks source link

[docker, web-qa]: Unable to fetch document using Selenium #50

Open ifTNT opened 1 week ago

ifTNT commented 1 week ago

Description:

The WebQA bot is failing to retrieve documents using Selenium. This results in an error message: "An error occurred while trying to fetch the document. Please make sure the submitted document exists and is publicly available."

Steps to Reproduce:

  1. Attempt to summarize the content of the URL: https://management.ntu.edu.tw/IM using WebQA.
  2. Observe the error message: "An error occurred while trying to fetch the document. Please make sure the submitted document exists and is publicly available."

Expected Outcome:

WebQA should successfully fetch and summarize the document from the provided URL.

Environment Details:

Additional context The full log:

docqa-executor-1    | 2024-09-05 20:09:32 [selenium.webdriver.common.selenium_manager] DEBUG:    Discovering versions from https:/04:09:32 [7/1509]
.github.io/chrome-for-testing/known-good-versions-with-downloads.json
docqa-executor-1    | 2024-09-05 20:09:32 [selenium.webdriver.common.selenium_manager] DEBUG:    Required driver: chromedriver 128.0.6613.119
docqa-executor-1    | 2024-09-05 20:09:32 [selenium.webdriver.common.selenium_manager] DEBUG:    Downloading chromedriver 128.0.6613.119 from https
://storage.googleapis.com/chrome-for-testing-public/128.0.6613.119/linux64/chromedriver-linux64.zip
docqa-executor-1    | 2024-09-05 20:09:32 [selenium.webdriver.common.selenium_manager] DEBUG:    Driver path: /root/.cache/selenium/chromedriver/li
nux64/128.0.6613.119/chromedriver
docqa-executor-1    | 2024-09-05 20:09:32 [selenium.webdriver.common.selenium_manager] DEBUG:    Browser path: /root/.cache/selenium/chrome/linux64
/128.0.6613.119/chrome
docqa-executor-1    | 2024-09-05 20:09:32 [selenium.webdriver.common.service] DEBUG:    Started executable: `/root/.cache/selenium/chromedriver/lin
ux64/128.0.6613.119/chromedriver` in a child process with pid: 104 using 0 to output -3
docqa-executor-1    | 2024-09-05 20:09:32 [src.crawler  ] WARNING:  Message: Service /root/.cache/selenium/chromedriver/linux64/128.0.6613.119/chro
medriver unexpectedly exited. Status code was: 127
docqa-executor-1    |
docqa-executor-1    | 2024-09-05 20:09:32 [asyncio      ] ERROR:    Unclosed client session
docqa-executor-1    | client_session: <aiohttp.client.ClientSession object at 0x77cedaa60d60>
docqa-executor-1    | 2024-09-05 20:09:32 [asyncio      ] ERROR:    Unclosed connector
docqa-executor-1    | connections: ['[(<aiohttp.client_proto.ResponseHandler object at 0x77cea18f9f00>, 244810.599)]']
docqa-executor-1    | connector: <aiohttp.connector.TCPConnector object at 0x77cedaa609a0>
docqa-executor-1    | 2024-09-05 20:09:32 [__main__     ] ERROR:    Error when constructing document store.
docqa-executor-1    | Traceback (most recent call last):
docqa-executor-1    |   File "/usr/src/app/docqa/docqa.py", line 155, in doc_qa
docqa-executor-1    |     document_store, docs = await self.document_store_factory.construct_document_store(
docqa-executor-1    |   File "/usr/src/app/docqa/src/document_store_factory.py", line 147, in construct_document_store
docqa-executor-1    |     document_store, docs = await self._construct_document_store(urls, document_store_kwargs, ttl_hash)
docqa-executor-1    |   File "/usr/src/app/docqa/src/document_store_factory.py", line 45, in __await__
docqa-executor-1    |     self.result = yield from self.co.__await__()
docqa-executor-1    |   File "/usr/src/app/docqa/src/document_store_factory.py", line 118, in _construct_document_store
docqa-executor-1    |     if len(docs) == 0: raise RuntimeError("Error fetching documents.")
docqa-executor-1    | RuntimeError: Error fetching documents.
ifTNT commented 1 week ago

Progress update:

  1. Installed missing dependency
  2. new error message:
    [src.crawler  ] WARNING:  Message: session not created: Chrome failed to start: exited normally.
    (session not created: DevToolsActivePort file doesn't exist)
    (The process started from chrome location /root/.cache/selenium/chrome/linux64/128.0.6613.119/chrome is no longer run
    ning, so ChromeDriver is assuming that Chrome has crashed.)
    Stacktrace:
    #0 0x6227c47ea86a <unknown>