rayleizhu / BiFormer

[CVPR 2023] Official code release of our paper "BiFormer: Vision Transformer with Bi-Level Routing Attention"
https://arxiv.org/abs/2303.08810
MIT License
460 stars 36 forks source link

slurm #44

Open shuli12318 opened 4 months ago

shuli12318 commented 4 months ago

i want to know how to run the .sh without slurm to segmentation, because the slurm is hard to use.

shuli12318 commented 3 months ago

单机单卡、单机多卡、不使用slurm实现语义分割。 1 数据集导入需要在semantic_segmentation文件夹下。

2 batch_size参数修改,在semantic_segmentation/configs/base/datasets/ade20k_sfpn.py中,修改为samples_per_gpu=8, workers_per_gpu=8即可。

3 lr和iters参数修改,在semantic_segmentation/configs/ade20k/sfpn.biformer_small.py中,根据经验公式gpusbatchsizeiters=定值,lr=定值(gpus*batchsize)修改lr和iters即可。

4 多卡训练参数修改,只需修改slurm.sh文件即可。以下是我没有使用slurm的多卡训练脚本。

!/usr/bin/env bash

PARTITION=mediasuper NOW=$(date '+%m-%d-%H:%M:%S') JOB_NAME=${MODEL}

CONFIG_DIR=configs/ade20k MODEL=sfpn.biformer_small CKPT=pretrained/biformer_small_best.pth CONFIG=${CONFIG_DIR}/${MODEL}.py OUTPUT_DIR=../outputs/seg WORK_DIR=${OUTPUT_DIR}/${MODEL}/${NOW} mkdir -p ${WORK_DIR}

PYTHONPATH="$(dirname $0)/..":$PYTHONPATH \

export CUDA_VISIBLE_DEVICES=0,1,2,3 # 暴露GPU编号 torchrun \ --nproc_per_node=4 \ # 这个就是GPU数量 --master_port=29501 \ train.py --config=${CONFIG} \ --launcher="pytorch" \ # 这里不使用slurm启动器 --work-dir=${WORK_DIR} \ --options model.pretrained=${CKPT} \ &> ${WORK_DIR}/train.${JOB_NAME}.log &

taoxingwang commented 3 months ago

你好请问单机单卡也是这么设置的吗?

shuli12318 commented 3 months ago

你好请问单机单卡也是这么设置的吗? 把export CUDA_VISIBLE_DEVICES=0,1,2,3 # 暴露GPU编号 改成0 把--nproc_per_node=4 \ # 这个就是GPU数量 改成1 就行了

还要导入一下ckpt权重文件,在作者的readme里面有,新建文件夹pretrained,下载到里面即可。

20191844308 commented 1 week ago

请问按这个改完后报错