microsoft / winfile

Original Windows File Manager (winfile) with enhancements
MIT License
6.82k stars 706 forks source link

oomkilled #374

Closed jentur-zabbeJ-8basdy closed 1 year ago

jentur-zabbeJ-8basdy commented 1 year ago

oomkilled image

默认的脚本 `set -x export BS=${BS:-16} export MEMCAP=${MEMCAP:-0} export GPUNUM=${GPUNUM:-1}

export MODLE_PATH="facebook/opt-${MODEL}" model_name_or_path=./opt6.7b

# HF_DATASETS_OFFLINE=1 TRANSFORMERS_OFFLINE=1 torchrun \ --nproc_per_node ${GPUNUM} \ --master_port 19198 \ train_gemini_opt.py \ --mem_cap ${MEMCAP} \ --model_name_or_path ${model_name_or_path} \ --batch_size ${BS} `

Environment

版本:torch1.12+cu113 deepspeed:0.7.7 内存:80G

Originally posted by @iMountTai in https://github.com/hpcaitech/ColossalAI/issues/2772

jentur-zabbeJ-8basdy commented 1 year ago

oomkilled

image

默认的脚本

`set -x

export BS=${BS:-16}

export MEMCAP=${MEMCAP:-0}

export GPUNUM=${GPUNUM:-1}

export MODLE_PATH="facebook/opt-${MODEL}"

model_name_or_path=./opt6.7b

# HF_DATASETS_OFFLINE=1 TRANSFORMERS_OFFLINE=1

torchrun \

--nproc_per_node ${GPUNUM} \

--master_port 19198 \

train_gemini_opt.py \

--mem_cap ${MEMCAP} \

--model_name_or_path ${model_name_or_path} \

--batch_size ${BS} `

Environment

版本:torch1.12+cu113

deepspeed:0.7.7

内存:80G

Originally posted by @iMountTai in https://github.com/hpcaitech/ColossalAI/issues/2772