error related to "public" directory

kylemcdonald commented 10 months ago

Last week I was able to run this code at commit ee4d6591e6791c2db474708e67642563ce4ff3f8 on a 3080 machine I have access to, but today I tried on a 4090 machine at commit c640c480d7b746dfc4af71977b85c8876d64355f and I got the following errors after following the same steps. I wonder if it's related to a recent change in the code base, or if it's a different setup between the two machines?

Edit: I was able to confirm that the commit ee4d6591e6791c2db474708e67642563ce4ff3f8 works on the new system, as do the next two commits. I will try to hunt down the exact issue.

This is after running:

docker build -t lcm-live .
docker run -ti -p 7860:7860 --gpus all lcm-live

> Using @sveltejs/adapter-static
error during build:
Error: EACCES: permission denied, mkdir '../public/_app/immutable/assets'
    at Object.mkdirSync (node:fs:1379:3)
    at mkdirp (file:///home/user/app/frontend/node_modules/@sveltejs/kit/src/utils/filesystem.js:7:6)
    at go (file:///home/user/app/frontend/node_modules/@sveltejs/kit/src/utils/filesystem.js:58:4)
    at file:///home/user/app/frontend/node_modules/@sveltejs/kit/src/utils/filesystem.js:55:5
    at Array.forEach (<anonymous>)
    at go (file:///home/user/app/frontend/node_modules/@sveltejs/kit/src/utils/filesystem.js:54:25)
    at file:///home/user/app/frontend/node_modules/@sveltejs/kit/src/utils/filesystem.js:55:5
    at Array.forEach (<anonymous>)
    at go (file:///home/user/app/frontend/node_modules/@sveltejs/kit/src/utils/filesystem.js:54:25)
    at file:///home/user/app/frontend/node_modules/@sveltejs/kit/src/utils/filesystem.js:55:5

frontend build failed
 exit 1

pipeline: controlnet 
DEVICE: cuda
TORCH_DTYPE: torch.float16
PIPELINE: controlnet
SAFETY_CHECKER: False
TORCH_COMPILE: False
The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling `transformers.utils.move_cache()`.
0it [00:00, ?it/s]
config.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 996/996 [00:00<00:00, 10.3MB/s]
diffusion_pytorch_model.safetensors: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.45G/1.45G [00:39<00:00, 36.9MB/s]
model_index.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 588/588 [00:00<00:00, 5.94MB/s]
tokenizer/special_tokens_map.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 133/133 [00:00<00:00, 2.47MB/s]
scheduler/scheduler_config.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 539/539 [00:00<00:00, 11.2MB/s]
text_encoder/config.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 610/610 [00:00<00:00, 12.9MB/s]
tokenizer/tokenizer_config.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 765/765 [00:00<00:00, 16.0MB/s]
(…)ature_extractor/preprocessor_config.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████| 518/518 [00:00<00:00, 11.6MB/s]
vae/config.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 651/651 [00:00<00:00, 3.66MB/s]
unet/config.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.73k/1.73k [00:00<00:00, 34.6MB/s]
tokenizer/merges.txt: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 525k/525k [00:00<00:00, 824kB/s]
tokenizer/vocab.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.06M/1.06M [00:00<00:00, 1.56MB/s]
model.safetensors: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 492M/492M [00:21<00:00, 23.1MB/s]
diffusion_pytorch_model.safetensors: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 335M/335M [00:27<00:00, 12.0MB/s]
diffusion_pytorch_model.safetensors: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3.44G/3.44G [01:52<00:00, 30.6MB/s]
Fetching 13 files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 13/13 [01:53<00:00,  8.73s/it]
Loading pipeline components...: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:00<00:00, 32.79it/s]
You have disabled the safety checker for <class 'diffusers.pipelines.controlnet.pipeline_controlnet_img2img.StableDiffusionControlNetImg2ImgPipeline'> by passing `safety_checker=None`. Ensure that you abide to the conditions of the Stable Diffusion license and do not expose unfiltered results in services or applications open to the public. Both the diffusers team and Hugging Face strongly recommend to keep the safety filter enabled in all public facing circumstances, disabling it only for use-cases that involve analyzing network behavior or auditing its results. For more information, please have a look at https://github.com/huggingface/diffusers/pull/254 .
Traceback (most recent call last):
  File "/home/user/app/run.py", line 5, in <module>
    uvicorn.run(
  File "/home/user/.local/lib/python3.10/site-packages/uvicorn/main.py", line 587, in run
    server.run()
  File "/home/user/.local/lib/python3.10/site-packages/uvicorn/server.py", line 61, in run
    return asyncio.run(self.serve(sockets=sockets))
  File "/usr/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "uvloop/loop.pyx", line 1517, in uvloop.loop.Loop.run_until_complete
  File "/home/user/.local/lib/python3.10/site-packages/uvicorn/server.py", line 68, in serve
    config.load()
  File "/home/user/.local/lib/python3.10/site-packages/uvicorn/config.py", line 467, in load
    self.loaded_app = import_from_string(self.app)
  File "/home/user/.local/lib/python3.10/site-packages/uvicorn/importer.py", line 21, in import_from_string
    module = importlib.import_module(module_str)
  File "/usr/lib/python3.10/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/home/user/app/app.py", line 21, in <module>
    init_app(app, user_data, args, pipeline)
  File "/home/user/app/app_init.py", line 158, in init_app
    os.makedirs("public")
  File "/usr/lib/python3.10/os.py", line 225, in makedirs
    mkdir(name, mode)
PermissionError: [Errno 13] Permission denied: 'public'

radames commented 10 months ago

hi @kylemcdonald, sorry for this. Yes I've rebuild the codebase so it's easier to add new pipelines and cool frontend features. Just did a fresh new docker install and didn't face that issue, very weird. One note is I've merge this the other day https://github.com/radames/Real-Time-Latent-Consistency-Model/blob/6df186bdfa9e23bbf48724a29b669e8a79989e67/app_init.py#L157-L158

build-run.sh essentially goes to frontend runs the bundler that points to ../public https://github.com/radames/Real-Time-Latent-Consistency-Model/blob/6df186bdfa9e23bbf48724a29b669e8a79989e67/frontend/svelte.config.js#L7-L12

Another tip using Docker, you can point your local HF_HOME folder, no need to re-download models

docker run -ti -p 7860:7860 -e HF_HOME=/data -v ~/.cache/huggingface:/data --gpus all lcm-live

kylemcdonald commented 10 months ago

currently checking commit by commit.

ff9325eeb0708a2c58e92a0643f940d1a7186e1d ModuleNotFoundError: No module named 'latent_consistency_controlnet'

d6fedfa3656b32f3d856a808851ca70878acafa6 ERROR: Error loading ASGI app. Could not import module "app-controlnet"

3207814394b0befdf520b7fe077e4b190a048271 ERROR: Error loading ASGI app. Could not import module "app-controlnet"

be970948e7372b77f9284af204eb34758f012340 ERROR: Error loading ASGI app. Could not import module "app-controlnet"

0e617d2f94b308c049255829e58d8ca9fb4cea12 ERROR: Cannot install -r /code/requirements.txt (line 1), -r /code/requirements.txt (line 11), -r /code/requirements.txt (line 2), -r /code/requirements.txt (line 3), -r /code/requirements.txt (line 9), gradio and transformers because these package versions have conflicting dependencies.

jumping forward to the next update to requirements.txt

0e5136c742a4416d95ce25573996e802a9ce335e this is where the docker image can first be built, and also where the above error appears.

so somewhere between this commit and ee4d6591e6791c2db474708e67642563ce4ff3f8 is the problem.

radames commented 10 months ago

hi @kylemcdonald thanks for the detailed tests. Interesting, when I start a new docker build from the latest commit https://github.com/radames/Real-Time-Latent-Consistency-Model/commit/74f6ce196bb9220a4d33d228da560c64b3d1ada4 I can successfully build and run.

docker build -t lcm-live .
docker run -ti -p 7860:7860 -e PIPELINE=controlnetSDXLTurbo  -e HF_HOME=/data -v ~/.cache/huggingface:/data  --gpus all lcm-live

$ docker --version
Docker version 24.0.7, build afdd53b
$ apt list --installed | grep nvidia-container-toolkit
nvidia-container-toolkit-base/unknown,now 1.14.3-1 amd64 [installed,automatic]
nvidia-container-toolkit/unknown,now 1.14.3-1 amd64 [installed]

kylemcdonald commented 10 months ago

I tried it again on a totally clean machine and got a different error today.

 sudo docker run -ti -p 7860:7860 -e PIPELINE=controlnetSDXLTurbo  -e HF_HOME=/data -v ~/.cache/huggingface:/data  --gpus all lcm-liv
e

==========
== CUDA ==
==========

CUDA Version 12.1.1

Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

added 265 packages, and audited 266 packages in 3s

59 packages are looking for funding
  run `npm fund` for details

found 0 vulnerabilities
npm notice 
npm notice New patch version of npm available! 10.2.3 -> 10.2.4
npm notice Changelog: https://github.com/npm/cli/releases/tag/v10.2.4
npm notice Run npm install -g npm@10.2.4 to update!
npm notice 

> frontend@0.0.1 build
> vite build

vite v4.5.0 building SSR bundle for production...
✓ 94 modules transformed.

vite v4.5.0 building for production...
✓ 87 modules transformed.
.svelte-kit/output/client/_app/version.json                              0.03 kB │ gzip:  0.05 kB
.svelte-kit/output/client/.vite/manifest.json                            2.77 kB │ gzip:  0.48 kB
.svelte-kit/output/client/_app/immutable/assets/2.4b49e46c.css           0.88 kB │ gzip:  0.27 kB
.svelte-kit/output/client/_app/immutable/assets/0.25926159.css           9.73 kB │ gzip:  2.74 kB
.svelte-kit/output/client/_app/immutable/nodes/0.c2e92d24.js             0.60 kB │ gzip:  0.38 kB
.svelte-kit/output/client/_app/immutable/chunks/index.57cd3851.js        0.92 kB │ gzip:  0.57 kB
.svelte-kit/output/client/_app/immutable/nodes/1.9764e15d.js             1.03 kB │ gzip:  0.59 kB
.svelte-kit/output/client/_app/immutable/chunks/singletons.12258393.js   2.45 kB │ gzip:  1.26 kB
.svelte-kit/output/client/_app/immutable/chunks/scheduler.d303939e.js    2.49 kB │ gzip:  1.16 kB
.svelte-kit/output/client/_app/immutable/entry/app.f3bbbc43.js           5.94 kB │ gzip:  2.34 kB
.svelte-kit/output/client/_app/immutable/chunks/index.c3a2e323.js        7.64 kB │ gzip:  3.11 kB
.svelte-kit/output/client/_app/immutable/entry/start.82e39185.js        24.87 kB │ gzip:  9.83 kB
.svelte-kit/output/client/_app/immutable/nodes/2.c1980465.js            71.73 kB │ gzip: 21.49 kB
✓ built in 392ms
.svelte-kit/output/server/.vite/manifest.json                          1.85 kB
.svelte-kit/output/server/_app/immutable/assets/_page.4b49e46c.css     0.88 kB
.svelte-kit/output/server/_app/immutable/assets/_layout.25926159.css   9.73 kB
.svelte-kit/output/server/entries/pages/_page.ts.js                    0.05 kB
.svelte-kit/output/server/internal.js                                  0.19 kB
.svelte-kit/output/server/entries/pages/_layout.svelte.js              0.25 kB
.svelte-kit/output/server/entries/fallbacks/error.svelte.js            0.89 kB
.svelte-kit/output/server/chunks/index.js                              2.58 kB
.svelte-kit/output/server/entries/pages/_page.svelte.js                3.36 kB
.svelte-kit/output/server/chunks/ssr.js                                3.70 kB
.svelte-kit/output/server/chunks/internal.js                           5.50 kB
.svelte-kit/output/server/index.js                                    88.88 kB

Run npm run preview to preview your production build locally.

> Using @sveltejs/adapter-static
  Wrote site to "../public"
  ✔ done
✓ built in 1.45s

frontend build success 

pipeline: controlnetSDXLTurbo 
DEVICE: cuda
TORCH_DTYPE: torch.float16
PIPELINE: controlnetSDXLTurbo
SAFETY_CHECKER: False
TORCH_COMPILE: False
There was a problem when trying to write in your cache folder (/data/hub). Please, ensure the directory exists and can be written to.
There was a problem when trying to write in your cache folder (/data/hub). You should set the environment variable TRANSFORMERS_CACHE to a writable directory.
Traceback (most recent call last):
  File "/home/user/.local/lib/python3.10/site-packages/diffusers/configuration_utils.py", line 371, in load_config
    config_file = hf_hub_download(
  File "/home/user/.local/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 118, in _inner_fn
    return fn(*args, **kwargs)
  File "/home/user/.local/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1210, in hf_hub_download
    os.makedirs(storage_folder, exist_ok=True)
  File "/usr/lib/python3.10/os.py", line 215, in makedirs
    makedirs(head, exist_ok=exist_ok)
  File "/usr/lib/python3.10/os.py", line 225, in makedirs
    mkdir(name, mode)
PermissionError: [Errno 13] Permission denied: '/data/hub'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/user/app/run.py", line 5, in <module>
    uvicorn.run(
  File "/home/user/.local/lib/python3.10/site-packages/uvicorn/main.py", line 587, in run
    server.run()
  File "/home/user/.local/lib/python3.10/site-packages/uvicorn/server.py", line 61, in run
    return asyncio.run(self.serve(sockets=sockets))
  File "/usr/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "uvloop/loop.pyx", line 1517, in uvloop.loop.Loop.run_until_complete
  File "/home/user/.local/lib/python3.10/site-packages/uvicorn/server.py", line 68, in serve
    config.load()
  File "/home/user/.local/lib/python3.10/site-packages/uvicorn/config.py", line 467, in load
    self.loaded_app = import_from_string(self.app)
  File "/home/user/.local/lib/python3.10/site-packages/uvicorn/importer.py", line 21, in import_from_string
    module = importlib.import_module(module_str)
  File "/usr/lib/python3.10/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/home/user/app/app.py", line 20, in <module>
    pipeline = pipeline_class(args, device, torch_dtype)
  File "/home/user/app/pipelines/controlnetSDXLTurbo.py", line 167, in __init__
    controlnet_canny = ControlNetModel.from_pretrained(
  File "/home/user/.local/lib/python3.10/site-packages/diffusers/models/modeling_utils.py", line 712, in from_pretrained
    config, unused_kwargs, commit_hash = cls.load_config(
  File "/home/user/.local/lib/python3.10/site-packages/diffusers/configuration_utils.py", line 415, in load_config
    raise EnvironmentError(
OSError: Can't load config for 'diffusers/controlnet-canny-sdxl-1.0'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure 'diffusers/controlnet-canny-sdxl-1.0' is the correct path to a directory containing a config.json file

$ docker --version
Docker version 24.0.7, build afdd53b
rzm@rzm-4090-0:~$ apt list --installed | grep nvidia-container-toolkit
nvidia-container-toolkit-base/unknown,now 1.14.3-1 amd64 [installed,automatic]
nvidia-container-toolkit/unknown,now 1.14.3-1 amd64 [installed]

kylemcdonald commented 10 months ago

I tried to run the script without Docker, and got the same error. This made me think it is permissions related. I think it is because the first time the script runs, it creates the huggingface cache using the root user. I don't understand why the Docker container would have trouble creating files if it has no trouble creating a directory. But I ran sudo `chown -R $USER:$USER ~/.cache/huggingface and then the non-Docker version worked.

kylemcdonald commented 10 months ago

After changing the ownership on that directory, the Docker version worked too. For anyone else following along, this is the exact steps to reproduce:

# install curl
sudo apt-get update
sudo apt install -y curl

# install git
sudo apt install -y git

# install NVIDIA drivers
wget https://developer.download.nvidia.com/compute/cuda/12.3.1/local_installers/cuda_12.3.1_545.23.08_linux.run
sudo apt remove -y --purge "*nvidia*"
sudo apt install -y build-essential
sudo sh cuda_12.3.1_545.23.08_linux.run
rm cuda_12.3.1_545.23.08_linux.run

# install docker
sudo apt-get update
sudo apt-get install ca-certificates curl gnupg
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
sudo chmod a+r /etc/apt/keyrings/docker.gpg
echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu \
  $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
  sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update
sudo apt-get install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin

# install NVIDIA container toolkit
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
  && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
    sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
    sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list \
  && \
    sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
sudo nvidia-ctk runtime configure --runtime=containerd
sudo systemctl restart containerd

# install radames
cd Documents
git clone https://github.com/radames/Real-Time-Latent-Consistency-Model.git
sudo docker build -t lcm-live .
sudo docker run -ti -p 7860:7860 -e HF_HOME=/data -v ~/.cache/huggingface:/data  --gpus all lcm-live

This will fail the first time, then run: sudo chown -R $USER:$USER ~/.cache/huggingface and run the last line again.

radames commented 10 months ago

thanks, I'm glad it works now!

radames / Real-Time-Latent-Consistency-Model

error related to "public" directory #30