Closed gbertulf closed 4 months ago
Synced with @gbertulf offline, this problem is caused by a bash grammar error in the starting script. After the fix, both devices are busy, but there still exists this error:
Issue is resolved. Closing this ticket. Thank you team for your help.
I followed the steps from this github link -- https://github.com/intel-analytics/BigDL/blob/main/python/llm/example/GPU/Deepspeed-AutoTP/README.md and attempted to verify 2 GPU inference runs on these Token Combinations
1) Initial run using default script with sym-int4 and 32 tokens
Note: This run just used one GPU as the world_size is 1 here.
2) Also tried with sym int4 , sym int8 , fp8 , fp16 on token sizes 2048x128 , 2048x 256. The general observation using XPU-SMI is shown below,
Notes: 1) Top is xpu-smi dump on device 0 and bottom portion is for xpu-smi dump on device 1 2) Notice device 0 is showing 99% GPU utilization vs device 1 showing close to 0% utilization
Kindly advise if there is any intermediate step to apply to achieve expected 2 GPU processes.
Please note that I am running the inference on this system -->NF5468-M6 with 8x Intel Flex GPU 170 Full system spec details are available here -> https://wiki.ith.intel.com/display/MediaWiki/Flex-170x8+%28Inspur+-+ICX%29+Qualification