Closed VectorYoung closed 5 years ago
I was interested in multi-gpu implementation at the beginning. But at that time I did not have enough time to do that.
IMHO, there is nothing difficult to do with multi-gpu implementation. The only problem is that the multi-gpu api in pytorch is high-level. Looking roughly, it seems that we only need to know how to parallelize autograd.grad
. I will also check this issue.
@khanrc Thanks for your reply. I am new to pytorch and know little about parallel computing. I just follow the pytorch tutorial by modifying 'model = DataParallel(model)', but it didn't work out. I thought the virtual_step was not parallelized, but I don't know how to fix it. Thanks a lot for your help.
Hi @khanrc,
Thanks for the PyTorch DARTS implementation! I am able to run the single GPU version of the code with no problem. However, I just hang forever with how I'm calling the multi-GPU version.
echo $CUDA_VISIBLE_DEVICES
returns 0,1,2
nvidia-smi
returns
Mon May 6 16:44:49 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.40.04 Driver Version: 418.40.04 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 TITAN Xp Off | 00000000:05:00.0 Off | N/A |
| 34% 48C P0 60W / 250W | 0MiB / 12194MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 TITAN Xp Off | 00000000:06:00.0 Off | N/A |
| 29% 44C P0 57W / 250W | 0MiB / 12196MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 TITAN Xp Off | 00000000:09:00.0 Off | N/A |
| 27% 42C P0 57W / 250W | 0MiB / 12196MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 TITAN Xp Off | 00000000:0A:00.0 Off | N/A |
| 23% 38C P8 9W / 250W | 159MiB / 12196MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
I then run with
python search.py --name cifar10-mg --dataset cifar10 --gpus 0,1,2 --batch_size 256 --workers 16 --print_freq 10 --w_lr 0.1 --w_lr_min 0.004 --alpha_lr 0.0012
I then get the following output, but it hangs forever
--w_lr 0.1 --w_lr_min 0.004 --alpha_lr 0.0012
05/06 04:45:40 PM |
05/06 04:45:40 PM | Parameters:
05/06 04:45:40 PM | ALPHA_LR=0.0012
05/06 04:45:40 PM | ALPHA_WEIGHT_DECAY=0.001
05/06 04:45:40 PM | BATCH_SIZE=256
05/06 04:45:40 PM | DATA_PATH=./data/
05/06 04:45:40 PM | DATASET=cifar10
05/06 04:45:40 PM | EPOCHS=50
05/06 04:45:40 PM | GPUS=[0, 1, 2]
05/06 04:45:40 PM | INIT_CHANNELS=16
05/06 04:45:40 PM | LAYERS=8
05/06 04:45:40 PM | NAME=cifar10-mg
05/06 04:45:40 PM | PATH=searchs/cifar10-mg
05/06 04:45:40 PM | PLOT_PATH=searchs/cifar10-mg/plots
05/06 04:45:40 PM | PRINT_FREQ=10
05/06 04:45:40 PM | SEED=2
05/06 04:45:40 PM | W_GRAD_CLIP=5.0
05/06 04:45:40 PM | W_LR=0.1
05/06 04:45:40 PM | W_LR_MIN=0.004
05/06 04:45:40 PM | W_MOMENTUM=0.9
05/06 04:45:40 PM | W_WEIGHT_DECAY=0.0003
05/06 04:45:40 PM | WORKERS=16
05/06 04:45:40 PM |
05/06 04:45:40 PM | Logger is set - training start
Files already downloaded and verified
####### ALPHA #######
# Alpha - normal
tensor([[0.1249, 0.1252, 0.1249, 0.1249, 0.1249, 0.1252, 0.1249, 0.1250],
[0.1249, 0.1251, 0.1250, 0.1250, 0.1252, 0.1249, 0.1250, 0.1250]],
device='cuda:0', grad_fn=<SoftmaxBackward>)
tensor([[0.1249, 0.1253, 0.1250, 0.1250, 0.1249, 0.1250, 0.1250, 0.1249],
[0.1251, 0.1247, 0.1249, 0.1253, 0.1248, 0.1249, 0.1251, 0.1253],
[0.1250, 0.1250, 0.1249, 0.1251, 0.1252, 0.1250, 0.1251, 0.1249]],
device='cuda:0', grad_fn=<SoftmaxBackward>)
tensor([[0.1250, 0.1249, 0.1250, 0.1250, 0.1250, 0.1251, 0.1249, 0.1251],
[0.1250, 0.1249, 0.1251, 0.1249, 0.1249, 0.1253, 0.1252, 0.1248],
[0.1250, 0.1250, 0.1251, 0.1248, 0.1251, 0.1250, 0.1251, 0.1250],
[0.1250, 0.1251, 0.1250, 0.1251, 0.1251, 0.1251, 0.1249, 0.1248]],
device='cuda:0', grad_fn=<SoftmaxBackward>)
tensor([[0.1250, 0.1250, 0.1249, 0.1249, 0.1251, 0.1250, 0.1249, 0.1252],
[0.1250, 0.1252, 0.1252, 0.1248, 0.1250, 0.1249, 0.1248, 0.1251],
[0.1251, 0.1251, 0.1250, 0.1251, 0.1248, 0.1250, 0.1249, 0.1249],
[0.1249, 0.1250, 0.1250, 0.1253, 0.1251, 0.1251, 0.1247, 0.1249],
[0.1251, 0.1251, 0.1252, 0.1249, 0.1249, 0.1251, 0.1250, 0.1249]],
device='cuda:0', grad_fn=<SoftmaxBackward>)
# Alpha - reduce
tensor([[0.1248, 0.1249, 0.1249, 0.1251, 0.1250, 0.1250, 0.1251, 0.1251],
[0.1250, 0.1248, 0.1249, 0.1252, 0.1249, 0.1250, 0.1249, 0.1251]],
device='cuda:0', grad_fn=<SoftmaxBackward>)
tensor([[0.1251, 0.1249, 0.1252, 0.1249, 0.1250, 0.1250, 0.1249, 0.1250],
[0.1250, 0.1249, 0.1250, 0.1251, 0.1251, 0.1249, 0.1250, 0.1251],
[0.1250, 0.1250, 0.1250, 0.1249, 0.1250, 0.1250, 0.1250, 0.1252]],
device='cuda:0', grad_fn=<SoftmaxBackward>)
tensor([[0.1251, 0.1247, 0.1249, 0.1252, 0.1252, 0.1249, 0.1251, 0.1249],
[0.1250, 0.1250, 0.1250, 0.1251, 0.1253, 0.1249, 0.1249, 0.1248],
[0.1250, 0.1252, 0.1250, 0.1250, 0.1251, 0.1248, 0.1251, 0.1249],
[0.1250, 0.1251, 0.1251, 0.1249, 0.1248, 0.1251, 0.1250, 0.1250]],
device='cuda:0', grad_fn=<SoftmaxBackward>)
tensor([[0.1250, 0.1249, 0.1251, 0.1251, 0.1251, 0.1249, 0.1250, 0.1250],
[0.1252, 0.1249, 0.1251, 0.1251, 0.1250, 0.1249, 0.1249, 0.1249],
[0.1249, 0.1250, 0.1251, 0.1250, 0.1249, 0.1250, 0.1250, 0.1252],
[0.1250, 0.1249, 0.1250, 0.1251, 0.1250, 0.1251, 0.1250, 0.1249],
[0.1249, 0.1251, 0.1250, 0.1251, 0.1248, 0.1251, 0.1249, 0.1251]],
device='cuda:0', grad_fn=<SoftmaxBackward>)
#####################
Do you know if I'm doing something wrong?
For anyone else, I see that sometimes I need to restart the process to correct the above issue.
Hi, thanks for the nice implementation. I am trying to modify the codes to support multi-gpu but it didn't work out. I don't know how to parallel the Architect. Do you have any suggestions or are you going to add the multi-gpu feature? Thanks for your help.