Open luckyq opened 3 months ago
Hi,
I am trying to test the attention computation on the CPU with zero-interference.
I use the following command to run the script.
BSZ=96 LOG_DIR=$BASE_LOG_DIR/${MODEL_NAME}_bs${BSZ} mkdir -p $LOG_DIR deepspeed --num_gpus 1 run_model.py --dummy --model ${FULL_MODEL_NAME} --batch-size ${BSZ} --cpu-offload --pin-memory 1 --offload-dir /tmp/data/dxu --gen-len 32 --pin-memory 1 --kv-offload --async_kv_offload
Duing the exection, I only saw one core is used.
However, there are many processes are created for the run.
Are there any other parameters or env I should configure to enable multiple-cores?
Hi,
I am trying to test the attention computation on the CPU with zero-interference.
I use the following command to run the script.
Duing the exection, I only saw one core is used.
However, there are many processes are created for the run.
Are there any other parameters or env I should configure to enable multiple-cores?