`llama-stack run` with meta reference inference provider fails with ModuleNotFoundError

romilbhardwaj commented 3 weeks ago

I'm following the getting started guide for llama-stack. When run llama stack run 8b-instruct, it fails with ModuleNotFoundError:

torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
worker_process_entrypoint FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2024-10-03_19:42:04
  host      : l4-2ea4-head-7te0mjrn-compute.us-east4-a.c.skypilot-375900.internal
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 18162)
  error_file: /var/tmp/torchelastic_3iqbc8v4/76b62c9e-8c1c-4ed5-a452-a6863e0f9297_aous0lgf/attempt_0/0/error.json
  traceback : Traceback (most recent call last):
    File "/opt/conda/envs/llamastack-8b-instruct/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 348, in wrapper
      return f(*args, **kwargs)
    File "/opt/conda/envs/llamastack-8b-instruct/lib/python3.10/site-packages/llama_stack/providers/impls/meta_reference/inference/parallel_utils.py", line 131, in worker_process_entrypoint
      model = init_model_cb()
    File "/opt/conda/envs/llamastack-8b-instruct/lib/python3.10/site-packages/llama_stack/providers/impls/meta_reference/inference/model_parallel.py", line 50, in init_model_cb
      llama = Llama.build(config)
    File "/opt/conda/envs/llamastack-8b-instruct/lib/python3.10/site-packages/llama_stack/providers/impls/meta_reference/inference/generation.py", line 84, in build
      from .quantization.loader import is_fbgemm_available
    File "/opt/conda/envs/llamastack-8b-instruct/lib/python3.10/site-packages/llama_stack/providers/impls/meta_reference/inference/quantization/loader.py", line 16, in <module>
      from llama_models.llama3.api.model import Transformer, TransformerBlock
  ModuleNotFoundError: No module named 'llama_models.llama3.api.model'

============================================================

Full logs, including the stack setup and configuration here: https://gist.github.com/romilbhardwaj/f21c3b1908b62ec5a906b321739d30cb

Versions:

$ pip freeze | grep llama
llama_models==0.0.39
llama_stack==0.0.39

Here's the full pip freeze: https://gist.github.com/romilbhardwaj/b05e950eeb03d5647d738382ba92f2a1

I tried with previous versions of llama_models, but that didn't work either. Am I missing something?

ashwinb commented 3 weeks ago

Looks like our fp8 code has rotted a bit. It's a bad import. I will fix this up quick.

romilbhardwaj commented 3 weeks ago

Thanks @ashwinb. I tried with bf16 and got the following:

(base) gcpuser@l4-2ea4-head-7te0mjrn-compute:~$ llama stack run 8b-instruct
Resolved 8 providers in topological order
  Api.models: routing_table
  Api.inference: router
  Api.shields: routing_table
  Api.safety: router
  Api.memory_banks: routing_table
  Api.memory: router
  Api.agents: meta-reference
  Api.telemetry: meta-reference

> initializing model parallel with size 1
> initializing ddp with size 1
> initializing pipeline with size 1
E1003 20:29:45.293000 140270447350976 torch/distributed/elastic/multiprocessing/api.py:702] failed (exitcode: -9) local_rank: 0 (pid: 22415) of fn: worker_process_entrypoint (start_method: fork)
E1003 20:29:45.293000 140270447350976 torch/distributed/elastic/multiprocessing/api.py:702] Traceback (most recent call last):
E1003 20:29:45.293000 140270447350976 torch/distributed/elastic/multiprocessing/api.py:702]   File "/opt/conda/envs/llamastack-8b-instruct/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/api.py", line 659, in _poll
E1003 20:29:45.293000 140270447350976 torch/distributed/elastic/multiprocessing/api.py:702]     self._pc.join(-1)
E1003 20:29:45.293000 140270447350976 torch/distributed/elastic/multiprocessing/api.py:702]   File "/opt/conda/envs/llamastack-8b-instruct/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 170, in join
E1003 20:29:45.293000 140270447350976 torch/distributed/elastic/multiprocessing/api.py:702]     raise ProcessExitedException(
E1003 20:29:45.293000 140270447350976 torch/distributed/elastic/multiprocessing/api.py:702] torch.multiprocessing.spawn.ProcessExitedException: process 0 terminated with signal SIGKILL
Process ForkProcess-1:
Traceback (most recent call last):
  File "/opt/conda/envs/llamastack-8b-instruct/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/opt/conda/envs/llamastack-8b-instruct/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/opt/conda/envs/llamastack-8b-instruct/lib/python3.10/site-packages/llama_stack/providers/impls/meta_reference/inference/parallel_utils.py", line 175, in launch_dist_group
    elastic_launch(launch_config, entrypoint=worker_process_entrypoint)(
  File "/opt/conda/envs/llamastack-8b-instruct/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 133, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/opt/conda/envs/llamastack-8b-instruct/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 264, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
worker_process_entrypoint FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2024-10-03_20:29:45
  host      : l4-2ea4-head-7te0mjrn-compute.us-east4-a.c.skypilot-375900.internal
  rank      : 0 (local_rank: 0)
  exitcode  : -9 (pid: 22415)
  error_file: <N/A>
  traceback : Signal 9 (SIGKILL) received by PID 22415
============================================================

Is there some way to debug/view logs to find the reason for SIGKILL. FWIW, I'm trying to run Llama3.1-8B-Instruct on 1x L4 GPU.

ashwinb commented 3 weeks ago

@romilbhardwaj A SIGKILL typically is due to an OOM. Could you perhaps try with Llama3.2-1B-Instruct?

meta-llama / llama-stack

`llama-stack run` with meta reference inference provider fails with ModuleNotFoundError #180