Closed jentur-zabbeJ-8basdy closed 1 year ago
oomkilled
默认的脚本
`set -x
export BS=${BS:-16}
export MEMCAP=${MEMCAP:-0}
export GPUNUM=${GPUNUM:-1}
export MODLE_PATH="facebook/opt-${MODEL}"
model_name_or_path=./opt6.7b
# HF_DATASETS_OFFLINE=1 TRANSFORMERS_OFFLINE=1
torchrun \
--nproc_per_node ${GPUNUM} \
--master_port 19198 \
train_gemini_opt.py \
--mem_cap ${MEMCAP} \
--model_name_or_path ${model_name_or_path} \
--batch_size ${BS} `
Environment
版本:torch1.12+cu113
deepspeed:0.7.7
内存:80G
Originally posted by @iMountTai in https://github.com/hpcaitech/ColossalAI/issues/2772
oomkilled
默认的脚本 `set -x export BS=${BS:-16} export MEMCAP=${MEMCAP:-0} export GPUNUM=${GPUNUM:-1}
export MODLE_PATH="facebook/opt-${MODEL}" model_name_or_path=./opt6.7b
# HF_DATASETS_OFFLINE=1 TRANSFORMERS_OFFLINE=1 torchrun \ --nproc_per_node ${GPUNUM} \ --master_port 19198 \ train_gemini_opt.py \ --mem_cap ${MEMCAP} \ --model_name_or_path ${model_name_or_path} \ --batch_size ${BS} `
Environment
版本:torch1.12+cu113 deepspeed:0.7.7 内存:80G
Originally posted by @iMountTai in https://github.com/hpcaitech/ColossalAI/issues/2772