intel-analytics / ipex-llm

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Mixtral, Gemma, Phi, MiniCPM, Qwen-VL, MiniCPM-V, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, vLLM, GraphRAG, DeepSpeed, Axolotl, etc
Apache License 2.0
6.72k stars 1.26k forks source link

User need to pass different extra_params for local mode and distributed mode to init_orca_context #3894

Open shanyu-sys opened 2 years ago

shanyu-sys commented 2 years ago

Related issue: #3867 #3891

Problem description

For some complicated ray parameters, users couldn't use the same extra_params for both the local mode and distributed mode.

For example, users may need to pass extra_params as below to work in local,

extra_params = {"_metrics_export_port": "10005"}

and as below for distributed.

extra_params = {"metrics-export-port": "10005", "worker-port-list": "10002,10003,10004"}

Internally, in local mode, we convert '-' to '_' and pass the parameters to ray.init(). In distributed mode, we directly apply the params to ray start. However, the parameters may not be the same for ray.init() and ray start, for example, --metrics-export-port in ray start and _metrics_export_port in ray.init; worker-port-list is only for ray start but not in ray.init()

Possible Solution

Solution 1: Use ray start for both local mode and distributed mode Solution 2:

  1. Identify the parameters starting with additional "_" in ray.init()
  2. Raise warning instead of error for parameters of ray start only when running in local.
shanyu-sys commented 2 years ago

@hkvision Could you help with the issue at your convenience?