jina-ai / dalle-flow

🌊 A Human-in-the-Loop workflow for creating HD images from text
grpcs://dalle-flow.dev.jina.ai
2.83k stars 209 forks source link

FileNotFoundError: #62

Closed Limbicnation closed 1 year ago

Limbicnation commented 2 years ago

diffusion/rep-0@36167 fail to load file dependency [07/05/22 03:44:38] ⠙ Waiting dalle diffusion summary... ━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━ 4/7 0:00:00ERROR diffusion/rep-0@36167 FileNotFoundError(2, 'No such file or directory') [07/05/22 03:44:38] during <class 'jina.serve.runtimes.worker.WorkerRuntime'> initialization add "--quiet-error" to suppress the exception details

FileNotFoundError: [Errno 2] No such file or directory:
'../glid-3-xl/finetune.pt' `

   bert.pt
   finetune.pt
   kl-f8.pt

are located in /dalle-flow/glid-3-xl

corporealfunk commented 1 year ago

I have this issue when running the docker image as well. Here's a snipit of the errors encountered, I haven't included the entire traceback nor the full output, just the error messages:

AWS g5.2xlarge Ubuntu 22.04 NVIDIA drivers installed and the nvidia container toolkit

Just before the "Jinna" ASCII art logo, I get this:

/home/dalle/.cache/bert.pt: No such file or directory
/home/dalle/.cache/kl-f8.pt: No such file or directory
/home/dalle/.cache/finetune.pt: No such file or directory
The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling `transformers.utils.move_cache()`.
Moving 0 files to the new cache system
0it [00:00, ?it/s]
⠸ Waiting dalle diffusion upscaler summary... ━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━ 2/6 0:00:01ERROR  diffusion/rep-0@381 FileNotFoundError(2, 'No such file or directory') during <class                           [09/21/22 18:32:49]
       'jina.serve.runtimes.worker.WorkerRuntime'> initialization
        add "--quiet-error" to suppress the exception details
       ╭──────────────────────────────────── Traceback (most recent call last) ────────────────────────────────────╮
       │ /usr/local/lib/python3.8/dist-packages/jina/orchestrate/pods/__init__.py:74 in run                        │
       │                                                                                                           │
       │    71 │                                                                                                   │
       │    72 │   try:                                                                                            │
       │    73 │   │   _set_envs()                                                                                 │
       │ ❱  74 │   │   runtime = runtime_cls(                                                                      │
       │    75 │   │   │   args=args,                                                                              │
       │    76 │   │   )                                                                                           │
       │    77 │   except Exception as ex:                                                                         │
       │                                                                                                           │
       │ /usr/local/lib/python3.8/dist-packages/jina/serve/runtimes/worker/__init__.py:32 in __init__              │
       │                                                                                                           │
       │    29 │   │   :param kwargs: keyword args                                                                 │
       │    30 │   │   """                                                                                         │
       │    31 │   │   self._health_servicer = health.HealthServicer(experimental_non_blocking=True)               │
       │ ❱  32 │   │   super().__init__(args, **kwargs)                                                            │
       │    33 │                                                                                                   │
       │    34 │   async def async_setup(self):                                                                    │
       │    35 │   │   """                                                                                         │

The Traceback block continues through the stack, and then:

       FileNotFoundError: [Errno 2] No such file or directory: '../glid-3-xl/finetune.pt'
DEBUG  diffusion/rep-0@381 process terminated
DEBUG  diffusion/rep-0@371 waiting for ready or shutdown signal from runtime                                         [09/21/22 18:32:49]
DEBUG  diffusion/rep-0@371 shutdown is is already set. Runtime will end gracefully on its own
DEBUG  diffusion/rep-0@371 terminating the runtime process
DEBUG  diffusion/rep-0@371 runtime process properly terminated
DEBUG  diffusion/rep-0@371 terminated
DEBUG  diffusion/rep-0@371 joining the process
DEBUG  diffusion/rep-0@371 successfully joined the proces

And then at the very end:

Traceback (most recent call last):
  File "/usr/local/bin/jina", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.8/dist-packages/jina_cli/__init__.py", line 143, in main
    getattr(api, args.cli.replace('-', '_'))(args)
  File "/usr/local/lib/python3.8/dist-packages/jina_cli/api.py", line 173, in flow
    with f:
  File "/usr/local/lib/python3.8/dist-packages/jina/orchestrate/flow/base.py", line 1416, in __enter__
    return self.start()
  File "/usr/local/lib/python3.8/dist-packages/jina/orchestrate/flow/builder.py", line 33, in arg_wrapper
    return func(self, *args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/jina/orchestrate/flow/base.py", line 1472, in start
    self._wait_until_all_ready()
  File "/usr/local/lib/python3.8/dist-packages/jina/orchestrate/flow/base.py", line 1571, in _wait_until_all_ready
    raise RuntimeFailToStart
jina.excepts.RuntimeFailToStart
corporealfunk commented 1 year ago

Ok, so I'm not a docker expert, but this is a permissions issue about how the dalle user inside the docker container accesses mounted/shared volumes with the host. Inside the docker container, the user is UID 1000. That means that the folder being shared from the host had better be owned by UI 1000. Also, the documentation here in the README has the docker launch -v flag option backwards, in Docker, to mount volumes like that the flag is:

-v host_src:container_dest

That means, to launch a container from the image, you really need to run this:

docker run -p 51005:51005 -it -v /home/ubuntu/.cache:/home/dalle/.cache --gpus all jinaai/dalle-flow

On the local host the directory /home/ubuntu/.cache needs to be present and needs to be owned by UID 1000.

delgermurun commented 1 year ago

If prebuilt image doesn't work, try building image locally by running below command

docker build --build-arg GROUP_ID=$(id -g ${USER}) --build-arg USER_ID=$(id -u ${USER}) -t jinaai/dalle-flow .

then try to run by similar command

docker run -p 51005:51005 \
  -it \
  -v $HOME/.cache:/home/dalle/.cache \
  --gpus all \
  jinaai/dalle-flow

I believe this will resolve the issue, if not feel free to reopen.