Closed Limbicnation closed 1 year ago
I have this issue when running the docker image as well. Here's a snipit of the errors encountered, I haven't included the entire traceback nor the full output, just the error messages:
AWS g5.2xlarge Ubuntu 22.04 NVIDIA drivers installed and the nvidia container toolkit
Just before the "Jinna" ASCII art logo, I get this:
/home/dalle/.cache/bert.pt: No such file or directory
/home/dalle/.cache/kl-f8.pt: No such file or directory
/home/dalle/.cache/finetune.pt: No such file or directory
The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling `transformers.utils.move_cache()`.
Moving 0 files to the new cache system
0it [00:00, ?it/s]
⠸ Waiting dalle diffusion upscaler summary... ━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━ 2/6 0:00:01ERROR diffusion/rep-0@381 FileNotFoundError(2, 'No such file or directory') during <class [09/21/22 18:32:49]
'jina.serve.runtimes.worker.WorkerRuntime'> initialization
add "--quiet-error" to suppress the exception details
╭──────────────────────────────────── Traceback (most recent call last) ────────────────────────────────────╮
│ /usr/local/lib/python3.8/dist-packages/jina/orchestrate/pods/__init__.py:74 in run │
│ │
│ 71 │ │
│ 72 │ try: │
│ 73 │ │ _set_envs() │
│ ❱ 74 │ │ runtime = runtime_cls( │
│ 75 │ │ │ args=args, │
│ 76 │ │ ) │
│ 77 │ except Exception as ex: │
│ │
│ /usr/local/lib/python3.8/dist-packages/jina/serve/runtimes/worker/__init__.py:32 in __init__ │
│ │
│ 29 │ │ :param kwargs: keyword args │
│ 30 │ │ """ │
│ 31 │ │ self._health_servicer = health.HealthServicer(experimental_non_blocking=True) │
│ ❱ 32 │ │ super().__init__(args, **kwargs) │
│ 33 │ │
│ 34 │ async def async_setup(self): │
│ 35 │ │ """ │
The Traceback block continues through the stack, and then:
FileNotFoundError: [Errno 2] No such file or directory: '../glid-3-xl/finetune.pt'
DEBUG diffusion/rep-0@381 process terminated
DEBUG diffusion/rep-0@371 waiting for ready or shutdown signal from runtime [09/21/22 18:32:49]
DEBUG diffusion/rep-0@371 shutdown is is already set. Runtime will end gracefully on its own
DEBUG diffusion/rep-0@371 terminating the runtime process
DEBUG diffusion/rep-0@371 runtime process properly terminated
DEBUG diffusion/rep-0@371 terminated
DEBUG diffusion/rep-0@371 joining the process
DEBUG diffusion/rep-0@371 successfully joined the proces
And then at the very end:
Traceback (most recent call last):
File "/usr/local/bin/jina", line 8, in <module>
sys.exit(main())
File "/usr/local/lib/python3.8/dist-packages/jina_cli/__init__.py", line 143, in main
getattr(api, args.cli.replace('-', '_'))(args)
File "/usr/local/lib/python3.8/dist-packages/jina_cli/api.py", line 173, in flow
with f:
File "/usr/local/lib/python3.8/dist-packages/jina/orchestrate/flow/base.py", line 1416, in __enter__
return self.start()
File "/usr/local/lib/python3.8/dist-packages/jina/orchestrate/flow/builder.py", line 33, in arg_wrapper
return func(self, *args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/jina/orchestrate/flow/base.py", line 1472, in start
self._wait_until_all_ready()
File "/usr/local/lib/python3.8/dist-packages/jina/orchestrate/flow/base.py", line 1571, in _wait_until_all_ready
raise RuntimeFailToStart
jina.excepts.RuntimeFailToStart
Ok, so I'm not a docker expert, but this is a permissions issue about how the dalle user inside the docker container accesses mounted/shared volumes with the host. Inside the docker container, the user is UID 1000. That means that the folder being shared from the host had better be owned by UI 1000. Also, the documentation here in the README has the docker launch -v flag option backwards, in Docker, to mount volumes like that the flag is:
-v host_src:container_dest
That means, to launch a container from the image, you really need to run this:
docker run -p 51005:51005 -it -v /home/ubuntu/.cache:/home/dalle/.cache --gpus all jinaai/dalle-flow
On the local host the directory /home/ubuntu/.cache
needs to be present and needs to be owned by UID 1000.
If prebuilt image doesn't work, try building image locally by running below command
docker build --build-arg GROUP_ID=$(id -g ${USER}) --build-arg USER_ID=$(id -u ${USER}) -t jinaai/dalle-flow .
then try to run by similar command
docker run -p 51005:51005 \
-it \
-v $HOME/.cache:/home/dalle/.cache \
--gpus all \
jinaai/dalle-flow
I believe this will resolve the issue, if not feel free to reopen.
diffusion/rep-0@36167 fail to load file dependency [07/05/22 03:44:38] ⠙ Waiting dalle diffusion summary... ━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━ 4/7 0:00:00ERROR diffusion/rep-0@36167 FileNotFoundError(2, 'No such file or directory') [07/05/22 03:44:38] during <class 'jina.serve.runtimes.worker.WorkerRuntime'> initialization add "--quiet-error" to suppress the exception details
FileNotFoundError: [Errno 2] No such file or directory:
'../glid-3-xl/finetune.pt' `
are located in
/dalle-flow/glid-3-xl