Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, Phi, MiniCPM, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, GraphRAG, DeepSpeed, vLLM, FastChat, Axolotl, etc.
yarn client mode: python lstm.py --cores 28 --num_workers 1 --cluster_mode yarn-client cost 81.61s.
While local mode python lstm.py --cores 28 --num_workers 1 cost 29.58s.
yarn-client cost 2.75X times than local mode.
Deep into the executions, I found HDFS operation(ls, mkdir, put) cost a lots of CPU time. Each operation will open a new java process and do a single HDFS operation.
AutoTS on yarn: n_sampling = 200
yarn client mode:
python lstm.py --cores 28 --num_workers 1 --cluster_mode yarn-client
cost 81.61s. While local modepython lstm.py --cores 28 --num_workers 1
cost 29.58s. yarn-client cost 2.75X times than local mode. Deep into the executions, I found HDFS operation(ls, mkdir, put) cost a lots of CPU time. Each operation will open a new java process and do a single HDFS operation.Originally posted by @qiuxin2012 in https://github.com/intel-analytics/BigDL/issues/6371#issuecomment-1322870974