pathway-labs / dropbox-ai-chat

AI-powered Dropbox search tool for private documents
https://github.com/pathwaycom/llm-app
MIT License
163 stars 32 forks source link

docker compose up error #5

Open stefan52a opened 3 months ago

stefan52a commented 3 months ago

Ubuntu 20.04.6 LTS Docker version 25.0.3, build 4debf41

ui-1   |   You can now view your Streamlit app in your browser.
ui-1   | 
ui-1   |   Network URL: http://172.19.0.3:8501
ui-1   |   External URL: http://45.138.53.66:8501
ui-1   | 
api-1  | /usr/local/lib/python3.11/site-packages/beartype/_util/hint/pep/utilpeptest.py:311: BeartypeDecorHintPep585DeprecationWarning: PEP 484 type hint typing.Sequence[str] deprecated by PEP 585. This hint is scheduled for removal in the first Python version released after October 5th, 2025. To resolve this, import this hint from "beartype.typing" rather than "typing". For further commentary and alternatives, see also:
api-1  |     https://beartype.readthedocs.io/en/latest/api_roar/#pep-585-deprecations
api-1  |   warn(
api-1  | /usr/local/lib/python3.11/site-packages/pathway/io/http/_server.py:622: UserWarning: delete_completed_queries arg of rest_connector should be set explicitly. It will soon be required.
api-1  |   warn(
api-1  | Traceback (most recent call last):
api-1  |   File "/app/main.py", line 12, in <module>
api-1  |     app_api.run(host=host, port=port)
api-1  |   File "/app/api.py", line 32, in run
api-1  |     documents = input_data.select(texts=parser(pw.this.data))
api-1  |                                         ^^^^^^^^^^^^^^^^^^^^
api-1  |   File "/usr/local/lib/python3.11/site-packages/pathway/internals/udfs/__init__.py", line 194, in __call__
api-1  |     return self.executor._apply_expression_type(
api-1  |            ^^^^^^^^^^^^^
api-1  | AttributeError: 'ParseUnstructured' object has no attribute 'executor'
api-1 exited with code 1
ui-1   | 2024-03-16 15:29:33.898 Uncaught app exception
ui-1   | Traceback (most recent call last):
ui-1   |   File "/usr/local/lib/python3.11/site-packages/urllib3/connection.py", line 198, in _new_conn
ui-1   |     sock = connection.create_connection(
ui-1   |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ui-1   |   File "/usr/local/lib/python3.11/site-packages/urllib3/util/connection.py", line 60, in create_connection
ui-1   |     for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
ui-1   |                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ui-1   |   File "/usr/local/lib/python3.11/socket.py", line 962, in getaddrinfo
ui-1   |     for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
ui-1   |                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ui-1   | socket.gaierror: [Errno -2] Name or service not known
ui-1   | 
ui-1   | The above exception was the direct cause of the following exception:
ui-1   | 
ui-1   | Traceback (most recent call last):
ui-1   |   File "/usr/local/lib/python3.11/site-packages/urllib3/connectionpool.py", line 793, in urlopen
ui-1   |     response = self._make_request(
ui-1   |                ^^^^^^^^^^^^^^^^^^^
ui-1   |   File "/usr/local/lib/python3.11/site-packages/urllib3/connectionpool.py", line 496, in _make_request
ui-1   |     conn.request(
ui-1   |   File "/usr/local/lib/python3.11/site-packages/urllib3/connection.py", line 400, in request
ui-1   |     self.endheaders()
ui-1   |   File "/usr/local/lib/python3.11/http/client.py", line 1293, in endheaders
ui-1   |     self._send_output(message_body, encode_chunked=encode_chunked)
ui-1   |   File "/usr/local/lib/python3.11/http/client.py", line 1052, in _send_output
ui-1   |     self.send(msg)
ui-1   |   File "/usr/local/lib/python3.11/http/client.py", line 990, in send
ui-1   |     self.connect()
ui-1   |   File "/usr/local/lib/python3.11/site-packages/urllib3/connection.py", line 238, in connect
ui-1   |     self.sock = self._new_conn()
ui-1   |                 ^^^^^^^^^^^^^^^^
ui-1   |   File "/usr/local/lib/python3.11/site-packages/urllib3/connection.py", line 205, in _new_conn
ui-1   |     raise NameResolutionError(self.host, self, e) from e
ui-1   | urllib3.exceptions.NameResolutionError: <urllib3.connection.HTTPConnection object at 0x7f67dd4c0550>: Failed to resolve 'api' ([Errno -2] Name or service not known)
ui-1   | 
ui-1   | The above exception was the direct cause of the following exception:
ui-1   | 
ui-1   | Traceback (most recent call last):
ui-1   |   File "/usr/local/lib/python3.11/site-packages/requests/adapters.py", line 486, in send
ui-1   |     resp = conn.urlopen(
ui-1   |            ^^^^^^^^^^^^^
ui-1   |   File "/usr/local/lib/python3.11/site-packages/urllib3/connectionpool.py", line 847, in urlopen
ui-1   |     retries = retries.increment(
ui-1   |               ^^^^^^^^^^^^^^^^^^
ui-1   |   File "/usr/local/lib/python3.11/site-packages/urllib3/util/retry.py", line 515, in increment
ui-1   |     raise MaxRetryError(_pool, url, reason) from reason  # type: ignore[arg-type]
ui-1   |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ui-1   | urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='api', port=8080): Max retries exceeded with url: / (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x7f67dd4c0550>: Failed to resolve 'api' ([Errno -2] Name or service not known)"))
ui-1   | 
ui-1   | During handling of the above exception, another exception occurred:
ui-1   | 
ui-1   | Traceback (most recent call last):
ui-1   |   File "/usr/local/lib/python3.11/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 542, in _run_script
ui-1   |     exec(code, module.__dict__)
ui-1   |   File "/app/ui.py", line 25, in <module>
ui-1   |     response = requests.post(url, json=data)
ui-1   |                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ui-1   |   File "/usr/local/lib/python3.11/site-packages/requests/api.py", line 115, in post
ui-1   |     return request("post", url, data=data, json=json, **kwargs)
ui-1   |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ui-1   |   File "/usr/local/lib/python3.11/site-packages/requests/api.py", line 59, in request
ui-1   |     return session.request(method=method, url=url, **kwargs)
ui-1   |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ui-1   |   File "/usr/local/lib/python3.11/site-packages/requests/sessions.py", line 589, in request
ui-1   |     resp = self.send(prep, **send_kwargs)
ui-1   |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ui-1   |   File "/usr/local/lib/python3.11/site-packages/requests/sessions.py", line 703, in send
ui-1   |     r = adapter.send(request, **kwargs)
ui-1   |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ui-1   |   File "/usr/local/lib/python3.11/site-packages/requests/adapters.py", line 519, in send
ui-1   |     raise ConnectionError(e, request=request)
ui-1   | requests.exceptions.ConnectionError: HTTPConnectionPool(host='api', port=8080): Max retries exceeded with url: / (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x7f67dd4c0550>: Failed to resolve 'api' ([Errno -2] Name or service not known)"))
stefan52a commented 3 months ago

If Dockerfile is python3.10

# Use an official Python runtime as a parent image
FROM python:3.10-slim

# Set the working directory in the container
WORKDIR /app
# Copy the dependencies file to the working directory
COPY requirements.txt requirements.txt
# Install any dependencies
RUN pip install --upgrade -r requirements.txt
# Copy the content of the local repo
COPY . .

gives the same error

stefan52a commented 3 months ago

What version of pathway is actually to be used?

stefan52a commented 3 months ago

So building seemed to work for me (there might be some redundancy):

I started off with a clean ubuntu 22.04 server system on a i9 with nvidia graphics card (not needed: within proxmox)

Install python 3.11: sudo apt update sudo apt upgrade

sudo apt install --reinstall pkg-config cmake-data

sudo apt install build-essential zlib1g-dev libncurses5-dev libgdbm-dev libnss3-dev libssl-dev libreadline-dev libffi-dev libsqlite3-dev wget libbz2-dev wget https://www.python.org/ftp/python/3.11.0/Python-3.11.0.tgz tar -xf Python-3.11.0.tgz cd Python-3.11.0/ ./configure --enable-optimizations make -j$(nproc) sudo make altinstall python3.11 --version nano ~/.bashrc add: alias python3='/usr/local/bin/python3.11' source ~/.bashrc python3 --version

modified Dockerfile to

# Use an official Python runtime as a parent image
FROM python:3.11

RUN apt update && apt install build-essential

# Set the working directory in the container
WORKDIR /app
# Copy the dependencies file to the working directory
COPY requirements.txt requirements.txt

# Install any dependencies
RUN pip install --upgrade -r requirements.txt
# Copy the content of the local repo
COPY . .

then follow the readme for installing via the docker version

However I still get:

dropbox-ai-chat-api-1  | /usr/local/lib/python3.11/site-packages/beartype/_util/hint/pep/utilpeptest.py:311: BeartypeDecorHintPep585DeprecationWarning: PEP 484 type hint typing.Sequence[str] deprecated by PEP 585. This hint is scheduled for removal in the first Python version released after October 5th, 2025. To resolve this, import this hint from "beartype.typing" rather than "typing". For further commentary and alternatives, see also:
dropbox-ai-chat-api-1  |     https://beartype.readthedocs.io/en/latest/api_roar/#pep-585-deprecations
dropbox-ai-chat-api-1  |   warn(
dropbox-ai-chat-api-1  | /usr/local/lib/python3.11/site-packages/pathway/io/http/_server.py:622: UserWarning: delete_completed_queries arg of rest_connector should be set explicitly. It will soon be required.
dropbox-ai-chat-api-1  |   warn(
dropbox-ai-chat-api-1  | Traceback (most recent call last):
dropbox-ai-chat-api-1  |   File "/app/main.py", line 12, in <module>
dropbox-ai-chat-api-1  |     app_api.run(host=host, port=port)
dropbox-ai-chat-api-1  |   File "/app/api.py", line 32, in run
dropbox-ai-chat-api-1  |     documents = input_data.select(texts=parser(pw.this.data))
dropbox-ai-chat-api-1  |                                         ^^^^^^^^^^^^^^^^^^^^
dropbox-ai-chat-api-1  |   File "/usr/local/lib/python3.11/site-packages/pathway/internals/udfs/__init__.py", line 194, in __call__
dropbox-ai-chat-api-1  |     return self.executor._apply_expression_type(
dropbox-ai-chat-api-1  |            ^^^^^^^^^^^^^
dropbox-ai-chat-api-1  | AttributeError: 'ParseUnstructured' object has no attribute 'executor'
stefan52a commented 3 months ago

Digging further So I looked at https://github.com/pathwaycom/llm-app running that docker version seemed to work:

curl --data '{"user": "user", "query": "How to connect to Kafka in Pathway?"}' http://localhost:8080/

If I find time I will try to compare code with dropbox-ai-chat and maybe find the issue.

stefan52a commented 3 months ago

See bottom (after SOLUTION)of this comment for the solution that worked for me.

So I decided to use poetry (as done in https://github.com/pathwaycom/llm-app) (again i might have redundancy here) and build without docker:

python3 -m venv pw-env && source pw-env/bin/activate python3 -m pip install --user pipx python3 -m pipx ensurepath (open a new terminal)

python3 -m venv pw-env && source pw-env/bin/activate pipx install poetry poetry completions bash >> ~/.bash_completion poetry init (answer the questions with the packages from requirements.txt, but you can add later e.g. poetry add tiktoken if you forgot something)

python main.py

Alas: same error https://github.com/pathway-labs/dropbox-ai-chat/issues/5#issue-2190105798

however I tried first with unstructured = {version = "0.10.15", extras = ["all-docs"]}

i removed that line for pyptoject.toml and did poetry add unstructured

which a.o. told me I had to set python to point to python3.11

so added

alias python=python3 to ~/.bashrc then source ~/.bashrc and again:

python -m venv pw-env && source pw-env/bin/activate

python --version gave Python 3.11.0

however

poetry add unstructured

still complained

So I did poetry add unstructured["all-docs"]="^0.11.8 poetry update deleted the poetry.lock file poetry install poetry run python main.py

Seemed that I had to recompile python (hahah), because it gave no modile _lzma sudo apt-get install lzma sudo apt-get install liblzma-dev sudo apt-get install libbz2-dev so I did recomplie python (see in a comment above how)

then again

poetry run python main.py gave same error AttributeError: 'ParseUnstructured' object has no attribute 'executor'

now I changed in pyproject.toml python = "^3.11" into python = "3.11"

got an error that because dropbox-ai-chat depends on both pathway (^0.8.3) and unstructured[all-docs] (^0.12.6), version solving failed.

SOLUTION I used poetry and changed pyproject.toml (copy parts from https://github.com/pathwaycom/llm-app) to

[tool.poetry]
name = "dropbox-ai-chat"
version = "0.1.0"
description = "AI-powered Dropbox search tool for private documents"
authors = ["many"]
license = "MIT"
readme = "README.md"

[tool.poetry.dependencies]
python = "3.11"
DateTime = "^5.4"
python-dotenv = "^1.0.1"
pathway = "=0.8.2"
openai = "^1.2.4"
requests = "^2.31.0"
streamlit = "^1.26.0"
unstructured = { extras = ["all-docs"], version = "^0.11.8"}
tiktoken = { version = "^0.6.0"}
litellm = "^1.18.0"

[build-system]
requires = ["poetry-core"]
build-backend = "poetry.core.masonry.api"

then poetry lock poetry install poetry run python main.py

stefan52a commented 3 months ago

server: poetry run python main.py UI: streamlit run ui.py