jina-ai / dalle-flow

🌊 A Human-in-the-Loop workflow for creating HD images from text
grpcs://dalle-flow.dev.jina.ai
2.83k stars 211 forks source link

Several errors in Docker #46

Closed ghost closed 2 years ago

ghost commented 2 years ago

Hello, I'm running into errors when running in Docker.

Right off the bat it throws this error:

`ERROR rerank/rep-0@21 ImportError('can not import module [06/20/22 13:48:52] from /dalle/dalle-flow/executors/rerank/executor.py') during <class 'jina.serve.runtimes.worker.WorkerRuntime'> initialization add "--quiet-error" to suppress the exception details ╭──────── Traceback (most recent call last) ────────╮ │ /usr/local/lib/python3.8/dist-packages/jina/impo… │ │ in _path_import │ │ │ │ 124 │ │ spec = importlib.util.spec_from_fil │ │ 125 │ │ module = importlib.util.module_from │ │ 126 │ │ sys.modules[spec_name] = module │ │ ❱ 127 │ │ spec.loader.exec_module(module) │ │ 128 │ except Exception as ex: │ │ 129 │ │ raise ImportError(f'can not import │ │ 130 │ │ :848 in │ │ exec_module │ │ :219 in │ │ _call_with_frames_removed │ │ │ │ /dalle/dalle-flow/executors/rerank/executor.py:1 │ │ in │ │ │ │ ❱ 1 from clip_client import Client │ │ 2 from jina import Executor, requests, Documen │ │ 3 │ │ 4 │ │ │ │ /usr/local/lib/python3.8/dist-packages/clip_clie… │ │ in │ │ │ │ 5 from clip_client.client import Client │ │ 6 │ │ 7 if 'NO_VERSION_CHECK' not in os.environ: │ │ ❱ 8 │ from clip_server.helper import is_latest │ │ 9 │ │ │ 10 │ is_latest_version(github_repo='clip-as-s │ │ 11 │ ╰───────────────────────────────────────────────────╯ ModuleNotFoundError: No module named 'clip_server'

   The above exception was the direct cause of the
   following exception:

   ╭──────── Traceback (most recent call last) ────────╮
   │ /usr/local/lib/python3.8/dist-packages/jina/orch… │
   │ in run                                            │
   │                                                   │
   │    71 │                                           │
   │    72 │   try:                                    │
   │    73 │   │   _set_envs()                         │
   │ ❱  74 │   │   runtime = runtime_cls(              │
   │    75 │   │   │   args=args,                      │
   │    76 │   │   )                                   │
   │    77 │   except Exception as ex:                 │
   │                                                   │
   │ /usr/local/lib/python3.8/dist-packages/jina/serv… │
   │ in __init__                                       │
   │                                                   │
   │    28 │   │   :param kwargs: keyword args         │
   │    29 │   │   """                                 │
   │    30 │   │   self._health_servicer = health.Heal │
   │ ❱  31 │   │   super().__init__(args, **kwargs)    │
   │    32 │                                           │
   │    33 │   async def async_setup(self):            │
   │    34 │   │   """                                 │
   │                                                   │
   │ /usr/local/lib/python3.8/dist-packages/jina/serv… │
   │ in __init__                                       │
   │                                                   │
   │    63 │   │   │   )                               │
   │    64 │   │                                       │
   │    65 │   │   self._setup_monitoring()            │
   │ ❱  66 │   │   self._loop.run_until_complete(self. │
   │    67 │                                           │
   │    68 │   def run_forever(self):                  │
   │    69 │   │   """                                 │
   │                                                   │
   │ /usr/lib/python3.8/asyncio/base_events.py:616 in  │
   │ run_until_complete                                │
   │                                                   │
   │    613 │   │   if not future.done():              │
   │    614 │   │   │   raise RuntimeError('Event loop │
   │    615 │   │                                      │
   │ ❱  616 │   │   return future.result()             │
   │    617 │                                          │
   │    618 │   def stop(self):                        │
   │    619 │   │   """Stop running the event loop.    │
   │                                                   │
   │ /usr/local/lib/python3.8/dist-packages/jina/serv… │
   │ in async_setup                                    │
   │                                                   │
   │    55 │   │   else:                               │
   │    56 │   │   │   self._summary_time = contextlib │
   │    57 │   │                                       │
   │ ❱  58 │   │   await self._async_setup_grpc_server │
   │    59 │                                           │
   │    60 │   async def _async_setup_grpc_server(self │
   │    61 │   │   """                                 │
   │                                                   │
   │ /usr/local/lib/python3.8/dist-packages/jina/serv… │
   │ in _async_setup_grpc_server                       │
   │                                                   │
   │    65 │   │   # Keep this initialization order    │
   │    66 │   │   # otherwise readiness check is not  │
   │    67 │   │   # The DataRequestHandler needs to b │
   │ ❱  68 │   │   self._data_request_handler = DataRe │
   │    69 │   │   │   self.args, self.logger, self.me │
   │    70 │   │   )                                   │
   │    71                                             │
   │                                                   │
   │ /usr/local/lib/python3.8/dist-packages/jina/serv… │
   │ in __init__                                       │
   │                                                   │
   │    38 │   │   self.args.parallel = self.args.shar │
   │    39 │   │   self.logger = logger                │
   │    40 │   │   self._is_closed = False             │
   │ ❱  41 │   │   self._load_executor(metrics_registr │
   │    42 │   │   self._init_monitoring(metrics_regis │
   │    43 │                                           │
   │    44 │   def _init_monitoring(self, metrics_regi │
   │                                                   │
   │ /usr/local/lib/python3.8/dist-packages/jina/serv… │
   │ in _load_executor                                 │
   │                                                   │
   │    76 │   │   :param metrics_registry: Optional p │
   │       passed to the executor so that it can expos │
   │    77 │   │   """                                 │
   │    78 │   │   try:                                │
   │ ❱  79 │   │   │   self._executor: BaseExecutor =  │
   │    80 │   │   │   │   self.args.uses,             │
   │    81 │   │   │   │   uses_with=self.args.uses_wi │
   │    82 │   │   │   │   uses_metas=self.args.uses_m │
   │                                                   │
   │ /usr/local/lib/python3.8/dist-packages/jina/jaml… │
   │ in load_config                                    │
   │                                                   │
   │   727 │   │   │                                   │
   │   728 │   │   │   if allow_py_modules:            │
   │   729 │   │   │   │   _extra_search_paths = extra │
   │ ❱ 730 │   │   │   │   load_py_modules(            │
   │   731 │   │   │   │   │   no_tag_yml,             │
   │   732 │   │   │   │   │   extra_search_paths=(_ex │
   │   733 │   │   │   │   │   if s_path               │
   │                                                   │
   │ /usr/local/lib/python3.8/dist-packages/jina/jaml… │
   │ in load_py_modules                                │
   │                                                   │
   │   267 │   │   │   )                               │
   │   268 │   │                                       │
   │   269 │   │   mod = [complete_path(m, extra_searc │
   │ ❱ 270 │   │   PathImporter.add_modules(*mod)      │
   │   271                                             │
   │                                                   │
   │ /usr/local/lib/python3.8/dist-packages/jina/impo… │
   │ in add_modules                                    │
   │                                                   │
   │   149 │   │   │   │   │   f'cannot import module  │
   │   150 │   │   │   │   )                           │
   │   151 │   │   │                                   │
   │ ❱ 152 │   │   │   _path_import(p)                 │
   │   153                                             │
   │                                                   │
   │ /usr/local/lib/python3.8/dist-packages/jina/impo… │
   │ in _path_import                                   │
   │                                                   │
   │   126 │   │   sys.modules[spec_name] = module     │
   │   127 │   │   spec.loader.exec_module(module)     │
   │   128 │   except Exception as ex:                 │
   │ ❱ 129 │   │   raise ImportError(f'can not import  │
   │   130                                             │
   │   131                                             │
   │   132 class PathImporter:                         │
   ╰───────────────────────────────────────────────────╯
   ImportError: can not import module from
   /dalle/dalle-flow/executors/rerank/executor.py`

then after downloading the mega model from wandb:

ModuleNotFoundError: No module named 'clip_server'

then gives me a CUDA out of memory ( I have an 24GB 3090 RTX, shouldn't be a problem right?) and then gets stuck in:

wandb: Downloading large artifact mega-1-fp16:latest, 4938.53MB. 7 files... Done. 0:0:7.3 device count: 1 DEBUG dalle/rep-0@19 <executor.DalleGenerator object at [06/20/22 13:51:01] 0x7f571825dc40> is successfully loaded! DEBUG dalle/rep-0@19 start listening on 0.0.0.0:59167 DEBUG dalle/rep-0@ 1 ready and listening [06/20/22 13:51:01] ERROR Flow@ 1 Flow is aborted due to ['diffusion', [06/20/22 13:51:01] 'rerank'] can not be started. DEBUG gateway/rep-0@ 1 waiting for ready or shutdown signal [06/20/22 13:51:01] from runtime DEBUG gateway/rep-0@ 1 terminate DEBUG gateway/rep-0@ 1 terminating the runtime process DEBUG gateway/rep-0@ 1 runtime process properly terminated DEBUG gateway/rep-0@ 1 terminated [06/20/22 13:51:02] DEBUG gateway/rep-0@ 1 joining the process DEBUG gateway/rep-0@35 process terminated [06/20/22 13:51:02] DEBUG gateway/rep-0@ 1 successfully joined the process DEBUG store/rep-0@ 1 waiting for ready or shutdown signal [06/20/22 13:51:02] from runtime DEBUG store/rep-0@ 1 terminate DEBUG store/rep-0@ 1 terminating the runtime process DEBUG store/rep-0@ 1 runtime process properly terminated DEBUG store/rep-0@23 cancel WorkerRuntime [06/20/22 13:51:02] DEBUG store/rep-0@23 stopped GRPC Server DEBUG store/rep-0@23 cancel WorkerRuntime DEBUG store/rep-0@23 stopped GRPC Server DEBUG store/rep-0@ 1 terminated DEBUG store/rep-0@ 1 joining the process DEBUG store/rep-0@23 process terminated [06/20/22 13:51:02] DEBUG store/rep-0@ 1 successfully joined the process DEBUG upscaler/rep-0@ 1 waiting for ready or shutdown [06/20/22 13:51:02] signal from runtime DEBUG upscaler/rep-0@ 1 terminate DEBUG upscaler/rep-0@ 1 terminating the runtime process DEBUG upscaler/rep-0@ 1 runtime process properly terminated DEBUG upscaler/rep-0@22 cancel WorkerRuntime [06/20/22 13:51:02] DEBUG upscaler/rep-0@22 stopped GRPC Server DEBUG upscaler/rep-0@22 cancel WorkerRuntime DEBUG upscaler/rep-0@22 stopped GRPC Server DEBUG upscaler/rep-0@ 1 terminated DEBUG upscaler/rep-0@ 1 joining the process DEBUG upscaler/rep-0@22 process terminated [06/20/22 13:51:02] DEBUG upscaler/rep-0@ 1 successfully joined the process DEBUG dalle/rep-0@ 1 waiting for ready or shutdown signal [06/20/22 13:51:02] from runtime DEBUG dalle/rep-0@ 1 terminate DEBUG dalle/rep-0@ 1 terminating the runtime process DEBUG dalle/rep-0@ 1 runtime process properly terminated DEBUG dalle/rep-0@19 cancel WorkerRuntime [06/20/22 13:51:02] DEBUG dalle/rep-0@19 stopped GRPC Server DEBUG dalle/rep-0@19 cancel WorkerRuntime DEBUG dalle/rep-0@19 stopped GRPC Server DEBUG dalle/rep-0@ 1 terminated DEBUG dalle/rep-0@19 process terminated [06/20/22 13:51:02] DEBUG dalle/rep-0@ 1 joining the process

Any ideas on what the problem is? Running on Ubuntu WSL2 in Windows 10

borrecw commented 2 years ago

pip install clip-server resolves one issue, I can't figure out the executor.py issue, but it appears to be fixed in the "feat-dockerize" branch

ghost commented 2 years ago

Thank you for the reply @borrecw. I added clip-server to the requirements.txt and ran build again. everything worked up until throwing again a bunch of CUDA out of memory errors. image

Is dalle flow even possible to run on a 3090? I know I had only 500 mb allocated from the 24 GB before running docker run, so I should have the required 21GB. Flow ends up providing the URLs for the server but don't seem to be working in the Colab. I tried the local, the public and the private link, not sure which one is supposed to be used.

image

borrecw commented 2 years ago

I had the branch docker running using V100s on an AWS spot p3 instance, everything worked except upscaling before I was preempted

hanxiao commented 2 years ago

closing for now as we are trying to provide an auto-build docker image in next few hours. feel free to open the issue if the new image still doesn't work.

ghost commented 2 years ago

Hello, I ran the pre built image and even though I still get a CUDA out of memory the server seems to run nonetheless. image However after copy and pasting the private or public address in the colab i still get this error:

image