Closed monajalal closed 6 years ago
It was slow but eventually the training finished with the following results under ~30hrs:
exp: vqa_gt_layout, iter = 200000
loss (vqa) = 0.050071, loss (layout) = 0.000070, loss (rec) = 0.000000, loss (sharpen) = 0.000000, sharpen_scale = 1.000000
accuracy (cur) = 0.968750, accuracy (avg) = 0.977994
snapshot saved to ./exp_clevr_snmn/tfmodel/vqa_gt_layout/00200000
how to solve this problem.
This might sound weird but for us training it on Tesla P100 reduced it to 7hrs which is pretty good. Though depending on what you want out of snmn 30hrs on 1080Ti is not that much of a big deal unless you want to change code frequently and retrain.
how many gpus and prefetch-num, I also use Tesla P100
This might sound weird but for us training it on Tesla P100 reduced it to 7hrs which is pretty good. Though depending on what you want out of snmn 30hrs on 1080Ti is not that much of a big deal unless you want to change code frequently and retrain.
Hello~ I want to learn about more details. Did you just git clone this code and run in a single GPU mode(P100)? it just spent about 7h.
during training I keep getting the following: data reader: waiting for data loading (IO slow)
also in the beginning I go the following message: imdb does not contain bounding boxes
Do you get these messages too? Or how should I improve it?
python exp_clevr_snmn/train_net_vqa.py --cfg exp_clevr_snmn/cfgs/vqa_gt_layout.yaml
here is the output of nvidia-smi while it is being trained
please let me know if you have any suggestion.