Closed Chrecci closed 3 months ago
Thanks for the report, @Chrecci! We're unfortunately only support CUDA right now – I'll add a note to the README and add Apple Silicon support to our team's backlog.
In the interim, you can set up the inference server on a remote machine with CUDA, and then run agentic systems on the Mac.
The version_base parameter is not specified. Please specify a compatability version level, or None. Will assume defaults for version 1.1 initialize(config_path=relative_path) Loading config from : /x/x/.llama/configs/inference.yaml Yaml config:
inference_config: impl_config: impl_type: inline checkpoint_config: checkpoint: checkpoint_type: pytorch checkpoint_dir: /.llama/checkpoints/Meta-Llama-3.1-8B-Instruct/ tokenizer_path: /.llama/checkpoints/Meta-Llama-3.1-8B-Instruct/tokenizer.model model_parallel_size: 1 quantization_format: bf16 quantization: null torch_seed: null max_seq_len: 16384 max_batch_size: 1
Listening on :::5000 INFO: Started server process [74351] INFO: Waiting for application startup. W0725 17:29:07.226000 7904910400 torch/distributed/elastic/multiprocessing/redirects.py:28] NOTE: Redirects are currently not supported in Windows or MacOs. ... File "/llama-agentic-system/venv_3_10/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 264, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
worker_process_entrypoint FAILED
Failures: