microsoft / DeepSpeedExamples

Example models using DeepSpeed
Apache License 2.0
5.83k stars 990 forks source link

About multiple-thread attention computation on CPU using zero-inference example. #886

Open luckyq opened 3 months ago

luckyq commented 3 months ago

Hi,

I am trying to test the attention computation on the CPU with zero-interference.

I use the following command to run the script.

BSZ=96
LOG_DIR=$BASE_LOG_DIR/${MODEL_NAME}_bs${BSZ}
mkdir -p  $LOG_DIR
deepspeed --num_gpus 1 run_model.py --dummy --model ${FULL_MODEL_NAME} --batch-size ${BSZ} --cpu-offload --pin-memory 1 --offload-dir /tmp/data/dxu --gen-len 32 --pin-memory 1 --kv-offload --async_kv_offload

Duing the exection, I only saw one core is used. Screenshot 2024-04-04 at 3 11 39 PM

However, there are many processes are created for the run.

Are there any other parameters or env I should configure to enable multiple-cores?