toverainc / willow-inference-server

Open source, local, and self-hosted highly optimized language inference server supporting ASR/STT, TTS, and LLM across WebRTC, REST, and WS
Apache License 2.0
368 stars 31 forks source link

Publish Docker image #81

Open skorokithakis opened 1 year ago

skorokithakis commented 1 year ago

Is it possible to publish a Docker image so we can just docker run --gpus=all toverainc/willow-inference-server and have it work? That would make it trivial to launch our own server.

kristiankielhofner commented 1 year ago

Definitely!

We're saving the push of official docker images for willow build as well as WIS for a 1.0 release. Both are currently under significant active development and we frequently have users test different branches, local fixes themselves, etc and in many cases it's not really possible if you're only running pulled docker images.

With 1.0 we'll have enough confidence in the testing, feedback, etc we've received from the community to publish docker images and provide a quality user experience with them.

If you look in the wisng branch you'll see a move towards docker compose, etc that will enable a more typical docker experience of "a docker run command" and/or "Clone repo, start".

sparkydave1981 commented 1 year ago

Further to this, it would be awesome if this could be set up as a Home Assistant add-on so we can install it onto HA servers running HAOS. My understanding is that HA add-ons are basically just Docker containers.... but I have no experience or full knowledge of Docker.

kristiankielhofner commented 1 year ago

We can certainly look at this but many of the Home Assistant integration aspects of Willow (WIS HAOS container, Home Assistant component for Willow, etc) are fairly significant efforts in their own right. @stintel and I are stretched pretty thin between Willow and WIS and this is an area where we think community contribution/collaboration would be very beneficial! It doesn't help that we have no experience with either and we'd have to get proficient first :).

tensiondriven commented 1 year ago

@sparkydave1981 What functionality would you be looking to enable by providing Home Assistant integration?

If this were done, my hope would be that it would be decoupled from willow-inference-server. Just as one doesn't typically run Frigate inference (NVR, video recording) directly on their Home Assistant box, I imagine people will want to run Willow on a different machine than the one running Home Assistant.

Going back to the topic of Docker Image - In my case, I'm running a NAS with Unraid and an Nvidia card. This would be the natural place for me to install Willow as a 24/7 service, and I think Unraid's docker support does not support docker-compose, so it would be important to have a vanilla docker option available. (I'm able to do this with stable-diffusion, for example).

kristiankielhofner commented 1 year ago

I'm learning more and more that "one doesn't typically do XYZ" doesn't really apply. People have done, suggested, and asked for all kinds of wild things I would have never considered, think are terrible ideas, doomed to failure, etc. The deployment configurations are literally endless. There are people that talk about things like passing a GPU to a VM to run podman rootless nested inside of LXC unprivileged containers... It never ends!

docker-compose YAML files are not only handy when you actually have docker compose, they are (IMO) a much better way to represent container arguments and container dependencies when compared to a bunch of random shell scripts with a ton of docker command line options. With docker compose YAML many people can go anywhere from docker command line to helm charts for K8s.

So they serve a dual purpose - usable with docker compose and (more or less) directly referenceable documentation (of sorts) for what is required to run the container(s) however you intend to.

sparkydave1981 commented 1 year ago

@sparkydave1981 What functionality would you be looking to enable by providing Home Assistant integration?

If this were done, my hope would be that it would be decoupled from willow-inference-server. Just as one doesn't typically run Frigate inference (NVR, video recording) directly on their Home Assistant box, I imagine people will want to run Willow on a different machine than the one running Home Assistant.

Going back to the topic of Docker Image - In my case, I'm running a NAS with Unraid and an Nvidia card. This would be the natural place for me to install Willow as a 24/7 service, and I think Unraid's docker support does not support docker-compose, so it would be important to have a vanilla docker option available. (I'm able to do this with stable-diffusion, for example).

I was referring to a HA add-on, not Integration. Just a way to run the WIS within Home Assistant OS.

kristiankielhofner commented 1 year ago

For CPU only support this is pretty straightforward.

For GPU far from it.

The issue here will be that without GPU WIS is only slightly faster on CPU than the official HAOS faster-whisper container. Granted it has the protocol support for Willow so that's a big plus and "gets you there" but because of the fundamental performance differences of CPU vs GPU the end-user experience won't be much different in terms of response time, accuracy, etc. WIS on the slowest supported GPU can do Whisper large faster than most CPUs can do tiny and that is an incredible difference in terms of accuracy and end user experience (tiny is essentially useless).

joelmenezes commented 1 year ago

Last I checked docker-compose had limited/no support for gpu pass through.

While I agree docker-compose makes the most sense in most instances, in my use case running an Unraid server with GPU pass through to a Plex Media server container, I'd prefer to have a dockerfile to be able to utilize my existing hardware.

So, eagerly waiting for WIS and Willow v1.0!

kristiankielhofner commented 1 year ago

You need both - Dockerfile is for building containers, not running them. Then it's either docker run or docker compose.

I'm not sure what you mean by "limited/no support for GPU pass-through". It's equivalent to the GPU arguments for docker albeit with different syntax of course. You can see both our Dockerfile and docker-compose.yml in the wisng branch.

ashishpandey commented 5 months ago

is there any movement on pushing docker images? I use wis in a docker environment (unraid) and that pretty much means me building a local image and pushing it to my local registry to consume from. Would be nice to cut out that step if possible

Is there a problem if the main branch is continuously and automatically pushed to latest tag on docker hub for a start? that will have the benefit of users being able to test out the bleeding edge if they want to. can evolve to release tagged versions from there when you think its ready

Would you be interested in contributions to this effect?

tensiondriven commented 5 months ago

+1, if this existed, I would run it on Unraid as well.

Sigfrodr commented 4 months ago

+1

kristiankielhofner commented 4 months ago

Quick update on this: we had some challenges because we needed to extend the underlying engine of WIS (ctranslate2) to support CUDA 12 but now CUDA 12 is natively supported by ctranslate2 so building and publishing docker images is significantly more straightforward. We have some testing and validation to do but with this published docker images are coming soon!

Sigfrodr commented 4 months ago

Quick update on this: we had some challenges because we needed to extend the underlying engine of WIS (ctranslate2) to support CUDA 12 but now CUDA 12 is natively supported by ctranslate2 so building and publishing docker images is significantly more straightforward. We have some testing and validation to do but with this published docker images are coming soon!

Thanks for the feedback, great news!

janstadt commented 3 months ago

Any updates on this? Would love to start fiddling around with this.

Sigfrodr commented 2 months ago

+1