Hi all,
I'm trying to run a multi-gpu training by running the following command:
CUDA_VISIBLE_DEVICES=1,3 python ./jerex_train.py --config-path configs/docred_joint
after the run is launched I see memory allocation for device 0 (i.e. CUDA 1), but not on device 1. I have tried with batch_size > 1 as well.
I guess some modifications are needed in the cfg file, specifically in the following section:
Hi all, I'm trying to run a multi-gpu training by running the following command:
CUDA_VISIBLE_DEVICES=1,3 python ./jerex_train.py --config-path configs/docred_joint
after the run is launched I see memory allocation for device 0 (i.e. CUDA 1), but not on device 1. I have tried withbatch_size > 1
as well.I guess some modifications are needed in the cfg file, specifically in the following section:
How can that be solved? My env is as described in your requirements.txt file.
Thanks.