Docker install instructions in the Readme are not working (improvement suggestion)

Lissanro commented 1 week ago

When I tried installing openedai-vision in Docker container I encountered multiple issues (mentioned in #20) but in the end I was able to figure out minimal steps required to get it working on PC with Nvidia cards. Since Nvidia I think is the most common platform to use, and you already have Nvidia-specific information in the Readme, I think adding relevant steps to actually get the project working on of the most popular platform could be useful, and may help users on other platforms/distributions as well because they will have a clue what steps they need to follow.

Here are steps to get the docker container running on Linux with Nvidia GPUs:

1) Run cp vision.sample.env vision.env and uncomment wanted model, optionally point HF_HOME to /home/username/.cache/huggingface/ if the model already downloaded there 2) Uncomment runtime: nvidia in docker-compose.yml 3) Install nvidia-container-toolkit using instructions from https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html 4) Run sudo service docker restart 5) Run docker compose up

For non-nvidia platforms steps 2 and 3 would be different. Note: old packages like "nvidia-container-runtime" are no longer needed and Nvidia officially marked them deprecated. Only nvidia-container-toolkit is needed to get things working. I hope this information helps to improve Readme. Please feel free to close this report, since Docker installation is something I already solved, so this issue report is just a possible suggestion to consider to improve the Readme instructions.

matatonic commented 1 week ago

This is actually more complex than is really needed, if you start with step 3 and complete the install as documented there, step 2 & 4 are not needed (nvidia will be the default and docker restarted at this step: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html#configuring-docker). Step 1 & 5 are already documented. I do link to the nvidia-container-toolkit docs, which you provided, thanks for that, but only as a FAQ/help item.

I get your point though, I don't really try to make it easy to get docker setup, I'll consider how to update the docs to make it simpler. People having trouble with nvidia+docker is a common problem for all the projects that prefer docker deployment, open-webui, litellm, etc.

Lissanro commented 1 week ago

Perhaps consider adding a comment in the docker instructions to mention that an additional package may be necessary for GPU support in the docker container and to check the item number 4 at "Known Problems & Workarounds" for details. Because currently these steps are guaranteed to fail for everyone who does not have docker specific requirements preinstalled. There are may be other ways to describe requirements, this is just a example suggestion, but as long as sequence of installation steps is clear, any location in the readme is fine. If correct sequence of steps was there, it would have saved a lot of time when I was working on getting the docker up and running with GPU support. But like I said, it is just a suggestion, but please consider it, taking into account first time experience of users who did not use docker with GPU before.

matatonic commented 1 week ago

Latest code has some updates for 3.12 & docker. See https://github.com/matatonic/openedai-vision/commit/a64740fb7187c71ba62413a3049636ea940d87f8

Lissanro commented 1 week ago

OK, thanks. I see that you updated Readme as well, current instructions look good and make it clear that the Nvidia container toolkit is needed. I am going to test them by tomorrow (or maybe this evening, if I find the time) and report back to confirm if there are no issues (probably there will not be, assuming my original steps contained some unnecessary commands, as you have mentioned).

Lissanro commented 1 week ago

OK, it mostly worked. The docker container itself got up and running, the docker GPU support is there and new Readme clearly mentions how to get it if not installed already. But to my surprise, I encountered the issue #19 even though it is solved long time ago. And it was in a repository that I cloned from scratch not that long ago when testing this again. Perhaps something just got cached, I do not know.

The full error log:

openedai-vision  | INFO:     172.19.0.1:45348 - "POST /v1/chat/completions HTTP/1.1" 200 OK
openedai-vision  | ERROR:    Exception in ASGI application
openedai-vision  |   + Exception Group Traceback (most recent call last):
openedai-vision  |   |   File "/usr/local/lib/python3.11/site-packages/starlette/middleware/base.py", line 188, in __call__
openedai-vision  |   |     await response(scope, wrapped_receive, send)
openedai-vision  |   |   File "/usr/local/lib/python3.11/site-packages/starlette/middleware/base.py", line 222, in __call__
openedai-vision  |   |     async for chunk in self.body_iterator:
openedai-vision  |   |   File "/usr/local/lib/python3.11/site-packages/starlette/middleware/base.py", line 179, in body_stream
openedai-vision  |   |     raise app_exc
openedai-vision  |   |   File "/usr/local/lib/python3.11/site-packages/starlette/middleware/base.py", line 149, in coro
openedai-vision  |   |     await self.app(scope, receive_or_disconnect, send_no_error)
openedai-vision  |   |   File "/usr/local/lib/python3.11/site-packages/starlette/middleware/cors.py", line 85, in __call__
openedai-vision  |   |     await self.app(scope, receive, send)
openedai-vision  |   |   File "/usr/local/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 62, in __call__
openedai-vision  |   |     await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
openedai-vision  |   |   File "/usr/local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 62, in wrapped_app
openedai-vision  |   |     raise exc
openedai-vision  |   |   File "/usr/local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 51, in wrapped_app
openedai-vision  |   |     await app(scope, receive, sender)
openedai-vision  |   |   File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 715, in __call__
openedai-vision  |   |     await self.middleware_stack(scope, receive, send)
openedai-vision  |   |   File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 735, in app
openedai-vision  |   |     await route.handle(scope, receive, send)
openedai-vision  |   |   File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 288, in handle
openedai-vision  |   |     await self.app(scope, receive, send)
openedai-vision  |   |   File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 76, in app
openedai-vision  |   |     await wrap_app_handling_exceptions(app, request)(scope, receive, send)
openedai-vision  |   |   File "/usr/local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 62, in wrapped_app
openedai-vision  |   |     raise exc
openedai-vision  |   |   File "/usr/local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 51, in wrapped_app
openedai-vision  |   |     await app(scope, receive, sender)
openedai-vision  |   |   File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 74, in app
openedai-vision  |   |     await response(scope, receive, send)
openedai-vision  |   |   File "/usr/local/lib/python3.11/site-packages/sse_starlette/sse.py", line 275, in __call__
openedai-vision  |   |     async with anyio.create_task_group() as task_group:
openedai-vision  |   |   File "/usr/local/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 685, in __aexit__
openedai-vision  |   |     raise BaseExceptionGroup(
openedai-vision  |   | ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
openedai-vision  |   +-+---------------- 1 ----------------
openedai-vision  |     | Traceback (most recent call last):
openedai-vision  |     |   File "/usr/local/lib/python3.11/site-packages/uvicorn/protocols/http/h11_impl.py", line 406, in run_asgi
openedai-vision  |     |     result = await app(  # type: ignore[func-returns-value]
openedai-vision  |     |              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
openedai-vision  |     |   File "/usr/local/lib/python3.11/site-packages/uvicorn/middleware/proxy_headers.py", line 70, in __call__
openedai-vision  |     |     return await self.app(scope, receive, send)
openedai-vision  |     |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
openedai-vision  |     |   File "/usr/local/lib/python3.11/site-packages/fastapi/applications.py", line 1054, in __call__
openedai-vision  |     |     await super().__call__(scope, receive, send)
openedai-vision  |     |   File "/usr/local/lib/python3.11/site-packages/starlette/applications.py", line 113, in __call__
openedai-vision  |     |     await self.middleware_stack(scope, receive, send)
openedai-vision  |     |   File "/usr/local/lib/python3.11/site-packages/starlette/middleware/errors.py", line 187, in __call__
openedai-vision  |     |     raise exc
openedai-vision  |     |   File "/usr/local/lib/python3.11/site-packages/starlette/middleware/errors.py", line 165, in __call__
openedai-vision  |     |     await self.app(scope, receive, _send)
openedai-vision  |     |   File "/usr/local/lib/python3.11/site-packages/starlette/middleware/base.py", line 185, in __call__
openedai-vision  |     |     with collapse_excgroups():
openedai-vision  |     |   File "/usr/local/lib/python3.11/contextlib.py", line 158, in __exit__
openedai-vision  |     |     self.gen.throw(typ, value, traceback)
openedai-vision  |     |   File "/usr/local/lib/python3.11/site-packages/starlette/_utils.py", line 83, in collapse_excgroups
openedai-vision  |     |     raise exc
openedai-vision  |     |   File "/usr/local/lib/python3.11/site-packages/sse_starlette/sse.py", line 278, in wrap
openedai-vision  |     |     await func()
openedai-vision  |     |   File "/usr/local/lib/python3.11/site-packages/sse_starlette/sse.py", line 258, in stream_response
openedai-vision  |     |     async for data in self.body_iterator:
openedai-vision  |     |   File "/app/vision.py", line 59, in streamer
openedai-vision  |     |     async for resp in vision_qna.stream_chat_with_images(request):
openedai-vision  |     |   File "/app/backend/qwen2-vl.py", line 54, in stream_chat_with_images
openedai-vision  |     |     msg = { 'role': m.role, 'content': c.text }
openedai-vision  |     |                                        ^
openedai-vision  |     | UnboundLocalError: cannot access local variable 'c' where it is not associated with a value
openedai-vision  |     +------------------------------------

As a workaround, I can run docker exec --interactive --tty --user root openedai-vision bash in order to log in into the container, then I can run apt update && apt install -y nano, and then using nano /app/backend/qwen2-vl.py I can manually apply the fix suggested in the first message of the #19 issue. Then it works.

But it is a mystery to me why I get the old code in the docker container, even though the code in the repository is defenitely fresh and does not have this issue. It was non-trivial to figure out how to edit the docker container either. Since I already got manual installation method working, I do not really need the docker, so the only reason why test it is to contribute and help to discover issues, so other users in the future can get things working more reliably. In this case, there is maybe a misconfiguration on my end or something cached outside the repository related to docker, but I found no way to refresh the docker container to the up to date source code, except patching it manually.

matatonic commented 1 week ago

so weird... maybe it's github workers... I've had a bunch of trouble with them recently. pip not working, disk filling, etc. I'm looking into it.

matatonic commented 1 week ago

confirmed.... the latest image has old code !?

matatonic commented 1 week ago

I checked the logs and it even shows it pulled 0.39.1... and was tagged :latest... I have no idea, it maybe had some github / ghrc.io server issue? The latest image is confirmed correct, 0.39.2.

matatonic commented 1 week ago

to build the docker from source you can just:

docker compose build

Sometimes you may need to:

docker compose build --no-cache

If you want to make sure to pull github repos, new pip version, etc, from inside the docker.

matatonic / openedai-vision

Docker install instructions in the Readme are not working (improvement suggestion) #23