sammcj commented 11 months ago

Just wondering if there’s any plans to have this bundled up into a container image and perhaps docker-compose?

This would make it really easy for people to both try it out but also run it on their home servers etc…

painebenjamin commented 11 months ago

Hello, thank you for the suggestion!

You have impeccable timing; I just got the container registry up and running yesterday, and the docker version just passed all my automated tests.

I'm going to do a little more testing before I make an announcement post and do a full tag of 0.2.1, but it's available now if you'd like to try it. I'll be looking into what (if anything) I need to do for docker-compose support, but these are the in-progress instructions for using the container:

Running Containerized

Since this requires a GPU for effective operation, your host machine must have a GPU and be able to host GPU-accelerated Docker containers. At the current moment, this is only available using the Nvidia Container Toolkit, available for Linux machines. You must install this toolkit and then restart your the Docker daemon before you can launch Enfugue with GPU acceleration.

The containerized version includes TensorRT support.

Pulling the Container

The container is available directly off the Github Container Repository, it can be pulled like this:

docker pull ghcr.io/painebenjamin/app.enfugue.ai:latest

Testing Capabilities

To check if the container is working and can communicate with your GPU, you can run the version command in the container.

docker run --rm --gpus all --runtime nvidia ghcr.io/painebenjamin/app.enfugue.ai:latest version

This is the expected result:

Enfugue v.0.2.1
Torch v.1.13.1+cu117

AI/ML Capabilities:
---------------------
Device type: cuda
CUDA: Ready
TensorRT: Ready
DirectML: Unavailable
MPS: Unavailable

Running

The basic run command is:

docker run --rm --gpus all --runtime nvidia -v ${{ YOUR CACHE DIRECTORY }}:/root/.cache -p 45554:45554 ghcr.io/painebenjamin/app.enfugue.ai:latest run

Import arguments are:

Passing --gpus is essential to tell Docker which GPUs to permit the container to user.
Pass --runtime nvidia to force Docker to use the Nvidia runtime.
Pass -v to mount a volume of your choice to /root/.cache, which is the parent directory where Enfugue looks for files, downloads checkpoints, etc. The directories can also be changed in the UI as needed to anything else.
Pass -p to bind the local port 45554 to the container port 45554.

You'll then be able to access to UI at https://app.enfugue.ai:45554. See below for information on changing the port.

Networking and Configuration

Enfugue uses a domain-based approach to allow for SSL encryption, enabling some browser features. For this reason, Enfugue uses the domain app.enfugue.ai, which resolves to the loopback address, 127.0.0.1. If you want to run Enfugue somewhere else than your local machine, you have two options:

The easiest way to make things work is to put an entry in your hosts file overriding app.enfugue.ai to the IP address of the machine running Enfugue. No other configuration is necessary if you do it this way.
Configure Enfugue to use a different domain (or no domain at all).

See Configuration for Advanced Users on how to use configuration files. You will need to ensure any configuration file passed can be read by the Docker container - this guide uses environment variables for ease-of-use.

Use a combination of SERVER_SECURE, SERVER_DOMAIN and SERVER_PORT to control how Enfugue assembles URLs and listens to requests.

Set SERVER_DOMAIN to your desired domain name or IP address.
Set SERVER_PORT to the desired port to listen to, or default to 45554.
Set SERVER_SECURE to 1/True/Yes to use HTTPS, or anything else for HTTP. The default configuration uses SSL, so this must be explicitly disabled.
- If you want to use HTTPS and aren't using app.enfugue.ai, you'll need to provide your own certificate with SERVER_CERT and key with SERVER_KEY. You can also provide your own chain with SERVER.CHAIN.

sammcj commented 11 months ago

Oh wow, wasn't expecting that so quickly!

I've just done a quick check and the image runs:

latest: Pulling from painebenjamin/app.enfugue.ai
....
fedba0b9c0e3: Pull complete
Digest: sha256:596807e766bbc0adc4f3f276901459dece88a3d9088982db33a235d1c8329691
Status: Downloaded newer image for ghcr.io/painebenjamin/app.enfugue.ai:latest
Enfugue v.0.2.1
Torch v.1.13.1+cu117

AI/ML Capabilities:
---------------------
Device type: cuda
CUDA: Ready
TensorRT: Ready
DirectML: Unavailable
MPS: Unavailable

I'll do a proper test and let you know how it goes.

Here's a docker-compose for you:

  enfugue:
    image: ghcr.io/painebenjamin/app.enfugue.ai:latest
    container_name: enfugue
    restart: unless-stopped
    profiles:
      - enfugue                 # Optional
    shm_size: "2gb"             # Optional (Gives enfugue access to more shared memory)
    security_opt:
      - no-new-privileges:true  # Optional
    runtime: nvidia
    environment:
      - SERVER_DOMAIN=my-cool-server
      - SERVER_PORT=45554
      - SERVER_SECURE=0
      - PUID=1001               # Optional (set to your non-root user)
      - PGID=1001               # Optional (set to your non-root group)
      - UMASK=002               # Optional
      - TZ=${TZ}                # Optional (set to your timezone)
    ports:
      - "45554:45554"
    volumes:
      - /mnt/REPLACEME/enfugue/cache:/root/.cache
      - /etc/localtime:/etc/localtime:ro
    command: ["run"]
    tty: true
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu, compute]

sammcj commented 11 months ago

Easy enough to get going with that container image.

Other than taking a long time to pull it "just worked" - which was an awesome user experience!

Two key things I'd recommend:

1) Reduce the image size.

The image is massive at 11GB - I would expect it to be somewhere in the realm of 250-650MB tops. I suspect something isn't quite right with the build.

I couldn't see your Dockerfile but I suspect perhaps you might be baking a full operating system into the final image rather than just the application.

Do you have the Dockerfile somewhere I could access? If so I'll have a review and submit a PR with any improvements I find.

2) Make sure you're dropping down to a non-root user inside the container.

Having the storage directory in /root/.cache is a little confusing - The application should be dropping down to a non-root user before it runs and as such shouldn't have access to /root, I'd recommend having the app run out of /app which is a common pattern.

.cache - The hidden cache dir is usually a pattern followed for temporary files such as in-flight data that the user would never need to access or make use of directly), I'd recommend using something like /data/{cache,models,lora,prompts,output} etc...

painebenjamin commented 11 months ago

That's so excellent news that it worked out of the box! 🙏

Thank you so much for the offer of assistance - it sounds like you have quite a bit more docker experience than I do, so I'd love your thoughts. Here's my overly-organized summary:

Image Size

First off, with the image size - it is definitely outsized by quite a lot, though I don't think we'll be able to get it down to the size you described (I can totally be wrong though.) The base image is 1.6 GB compressed, with the biggest space hogs being the CUDA and CUDNN runtimes. It's entirely possible I am including multiple copies of the runtimes, though.

Here is the dockerfile - the templated variables get populated during build. There are a bunch of inefficiencies - installing software-properties-common just to get add-apt-repository for python 3.10, for example. I can guarantee the system has a Python3.8 installation somewhere that's not necessary. Enfugue will work with 3.8, but 3.10 has notably better performance, especially in memory allocation which I've had to dedicate a lot of time to managing.

These are less important for controlling the build but also pertain to it if you want to see: this is the Github runner workflow for building and publishing, and the relevant section of the makefile.

We can shave 600-1000MB off by not including the CUDNN runtime (and using this image instead,) since that's only needed for TensorRT, but then we're locked into shipping multiple SKUs, and I'm already doing quite a bit of that. It's definitely an option but I'd like to try whatever else before shipping gets even more complicated than it already is.

Root vs. Non-Root

I'm currently not making any attempts to gain nor shed administrator privilege, though I think that just means I'm always working as root. Other than the file locations (more on that in a second,) what are the advantages/disadvantages to using a non-root user, besides the obvious inability to break stuff?

File Location

You're not the first to mention the location - I definitely think the defaults should be moved, but I'm not sure where is best that works cross-platform, knowing that this will be executed as a non-privileged user in basically all situations. I'd love your input there.

The original choice wasn't totally arbitrary; it's the same place that Stable Diffusion downloads by default (~/.cache/huggingface on all platforms.) In my mind the original intended audience for Enfugue was people who've tried to use SD in the past, so I kept the same location as the user would be used to that spot, but I think my assumptions regarding who wants to use Enfugue haven't completely played out.

Thanks again for your help! I'm so glad to get another set of eyes on this.

sammcj commented 11 months ago

re: Image size.

I had a good look and I totally agree with you it seems like CUDA, CUDNN and friends are pretty heavy - what could be done is a profile on a running container to generate a list of all files accessed during normal operation and then automatically add only those files in - but it's been a while since I've done that so I'd need to do some more research first.

re: Root vs. Non-root.

Other than the effort of dropping down to the non-root user and recursively chowning anything it should have access to there's no real downside, the up side is that on the host running it - you're not (by default) mapping the running application to your root user (which means anything inside the container could basically do anything outside the container).

re: File locations.

Yeah it's not a big deal at all, I've noticed really weird directory paths seem to be a common thing within the ML/AI community - several other projects I've seen recently use directories called 'cache' for storing data (not a big deal I guess, but some folks might want to make sure their models are included in backups which if within a cache directory - might be excluded).

--

Aside from those really quite minor feedback items - I wanted to say really well done on this whole project, it works incredibly well!

painebenjamin commented 10 months ago

Thank you for the kind words!

I've finally got 0.2.1 tagged and up, which includes dropping down to non-root. I wanted to point out that I had some collisions with environment variables, so I changed environment configuration overrides to all have a prefix of ENFUGUE_ for namespacing, so it's now ENFUGUE_SERVER_HOST, ENFUGUE_SERVER_PORT, etc.

JohanAR commented 10 months ago

Regarding image size, anything pulling in Stable Diffusion is going to be bloated I'm afraid. Both Automatic1111 and ComfyUI builds into 7GB docker images. If you run them without docker you get a venv/conda of the same size instead, unless you install globally and risk messing up your system Python. Kohya builds into a 28GB docker image :S

Perhaps someone could make a base image with cuda and the diffusers library, and try to convince all the projects to build their images on top of it, but that would ofc. be outside the scope of enfugue.

sammcj commented 10 months ago

I've finally got 0.2.1 tagged and up, which includes dropping down to non-root.

🎉 that is awesome!

As the project is aimed at people interested in looking into / working from their data / documents in my opinion makes it a slightly higher value target than say general image generation projects - so it’s great to see this. It’s only a small thing but you’re setting a good example for other AI/ML projects by doing this.

—-

Kohya builds into a 28GB docker image

what the… 🤦 , I don’t know that one but willing to bet it’s a windows thing. The whole point of container images is that they’re lightweight, portable and stateless.

I honestly don’t see how applications and their libraries can be this big - I suspect that people are bundling data such as models in with the application. 28GB is significantly larger than an entire Linux distribution with every single application installed 🤣.

Perhaps someone could make a base image with cuda…

Part of the problem I’ve seen is that people seem to be using a full Ubuntu container image for the various AI/ML projects - something Nvidia is guilty of too and not even the base runtime image (which are still massive) - the large devel images which are designed for building from rather than running from.

…and the diffusers library.

What’s actually in that stable diffusion library? Surely it can’t be a GB of Python or c?

If I had more time I’d probably use a tool to watch a container under normal operations to see what files it’s actually using, I’d be willing to bet that if there really isn’t data being bundled into them it’d be less than 50%.

Anyway - no big deal at all but it is interesting to see all this rapid development across projects such as this.

painebenjamin commented 10 months ago

...and the diffusers library.

I do not believe A1111 nor Comfy actually use Diffusers. I could be wrong, but I've been diving in their code for a few months now to cannibalize parts for Enfugue, and I have seen A1 remove it's dependency on Diffusers, and I don't believe Comfy ever had one.

Instead of diffusers, they have their own code modeling the SD neural nets (but still use torch and the same weights, so it's very similar.)

There's advantages and disadvantages there. For advantages, both Comfy and A1 are faster than Enfugue/Diffusers on its own. There's too much code there to be able to isolate any one significant change that improves their performance over base Stable Diffusion, but it does. That sort of means the versions in their applications are not the same as the official Stability release, so results from Comfy SD or A1111 SD will always be different from the official SD pipeline - as far as if those speed gains have come with quality tradeoffs will be up to the user to decide. For example, I'm fairly certain the Comfy UI does not use two text encoders for SDXL Base (1.5 and 2 only used one,) but official SDXL and Enfugue do, which lets Comfy operate SDXL with significantly less overhead. The results are different, of course, but not necessarily worse.

For disadvantages, they have to re-implement every feature that Stability pushes out, or at least they can't simply upgrade their diffusers version and try out the new feature. Enfugue doesn't have that disadvantage - I had XL ControlNet Canny working within minutes of release because it was only like four lines to get it in. Of course that means Enfugue isn't as optimized as the other applications in general, but I'd have a hard time managing the scope of this project on my own without leveraging as many libraries as possible. Perhaps some of the authors of those improvements will find their way here eventually! :)

JohanAR commented 10 months ago

what the… 🤦 , I don’t know that one but willing to bet it’s a windows thing. The whole point of container images is that they’re lightweight, portable and stateless.

I honestly don’t see how applications and their libraries can be this big - I suspect that people are bundling data such as models in with the application. 28GB is significantly larger than an entire Linux distribution with every single application installed rofl.

I didn't look but I think it's a pretty safe bet that kuhya_ss downloaded a couple GB of models when building the image.

It seems like cuda related .so files are massive, several of them 100-500MB each. Similar situation with torch and tensorflow, filled with large .so files, many named libcu* so I'm guessing also cuda related.. libtensorflow_cc.so is 1GB. No idea what they put in there.. Maybe something like pre-compiled GPU programs for every single Nvidia card made :)

I do not believe A1111 nor Comfy actually use Diffusers

Looking at github it appears that A1111 is indeed no longer using diffusers. It was a few months since I looked at any of its code. And I kind of just assumed comfy were using it so I believe you if you say that it doesn't.

painebenjamin / app.enfugue.ai

Run containerised / docker-compose support? #47

Running Containerized

Pulling the Container

Testing Capabilities

Running

Networking and Configuration

Image Size

Root vs. Non-Root

File Location