Before you use the code to train your own data set, please first enter the _train_gpu.py__ file and modify the data_root, batch_size and nb_classes parameters. If you want to draw the confusion matrix and ROC curve, you only need to remove the comments of Plot_ROC and Predictor___ at the end of the code. For the third parameter, you should change it to the path of your own model weights file(.pth).
You can use anther optimizer sophia, just need to change the optimizer in _traingpu.py, for this training sample, can achieve better results
# optimizer = create_optimizer(args, model_without_ddp)
optimizer = SophiaG(model.parameters(), lr=2e-4, betas=(0.965, 0.99), rho=0.01, weight_decay=args.weight_decay)
python train_gpu.py
python -m torch.distributed.launch --nproc_per_node=8 train_gpu.py
(using a specified part of the cards: for example, I want to use the second and fourth cards)
CUDA_VISIBLE_DEVICES=1,3 python -m torch.distributed.launch --nproc_per_node=2 train_gpu.py
(For the specific number of GPUs on each machine, modify the value of --nproc_per_node. If you want to specify a certain card, just add CUDA_VISIBLE_DEVICES= to specify the index number of the card before each command. The principle is the same as single-machine multi-card training)
On the first machine: python -m torch.distributed.launch --nproc_per_node=1 --nnodes=2 --node_rank=0 --master_addr=<Master node IP address> --master_port=<Master node port number> train_gpu.py
On the second machine: python -m torch.distributed.launch --nproc_per_node=1 --nnodes=2 --node_rank=1 --master_addr=<Master node IP address> --master_port=<Master node port number> train_gpu.py
@inproceedings{liu2023efficientvit,
title={EfficientViT: Memory Efficient Vision Transformer with Cascaded Group Attention},
author={Liu, Xinyu and Peng, Houwen and Zheng, Ningxin and Yang, Yuqing and Hu, Han and Yuan, Yixuan},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={14420--14430},
year={2023}
}