Open pzxdd opened 6 years ago
Did you use it in training script?
@tanakataiki yes. I added one line in train.py just mentioned above
Well I have the same problem In training and I am going to work onto it when i have time
do you have any clue?
@tanakataiki @qqwweee
Humm ... I am not pretty sure but , I think we need to decide the input image , batch size before the network construction in this case?
you should not use the concateate but use the add, beacause the last layers outputs is shape=(1,) tensor can not use the concateate . It need to add their from 3 single GPUS and get the sum of them.
Although I don't think it is the best method, this way it worked. Please rewrite it as follows.
return K.expand_dims(loss, axis=0)
https://github.com/qqwweee/keras-yolo3/blob/da7d756b0e47b979e701f0131ba7074ea138add8/train.py#L73
'yolo_loss': lambda y_true, y_pred: y_pred[0]
i also has this problem,sad……
I followed your example @nakasu , and it got rid of the error messages. However, I don't really see an improvement in batch speed. Were you able to get significant speed improvements with these? Wonder if I need to cchange the generator in someway to feed the data appropriately.
This is what I get when I call for parallel_model.summary() on gpus=4. It seems not correct as an input layer is split weirdly in 4 input layers..
parallel_model.summary()
Layer (type) Output Shape Param # Connected to
input_1 (InputLayer) (None, None, None, 3 0
input_2 (InputLayer) (None, 25, 25, 3, 6) 0
input_3 (InputLayer) (None, 50, 50, 3, 6) 0
input_4 (InputLayer) (None, 100, 100, 3, 0
lambda_1 (Lambda) (None, None, None, 3 0 input_1[0][0]
lambda_2 (Lambda) (None, 25, 25, 3, 6) 0 input_2[0][0]
lambda_3 (Lambda) (None, 50, 50, 3, 6) 0 input_3[0][0]
lambda_4 (Lambda) (None, 100, 100, 3, 0 input_4[0][0]
lambda_5 (Lambda) (None, None, None, 3 0 input_1[0][0]
lambda_6 (Lambda) (None, 25, 25, 3, 6) 0 input_2[0][0]
lambda_7 (Lambda) (None, 50, 50, 3, 6) 0 input_3[0][0]
lambda_8 (Lambda) (None, 100, 100, 3, 0 input_4[0][0]
lambda_9 (Lambda) (None, None, None, 3 0 input_1[0][0]
lambda_10 (Lambda) (None, 25, 25, 3, 6) 0 input_2[0][0]
lambda_11 (Lambda) (None, 50, 50, 3, 6) 0 input_3[0][0]
lambda_12 (Lambda) (None, 100, 100, 3, 0 input_4[0][0]
lambda_13 (Lambda) (None, None, None, 3 0 input_1[0][0]
lambda_14 (Lambda) (None, 25, 25, 3, 6) 0 input_2[0][0]
lambda_15 (Lambda) (None, 50, 50, 3, 6) 0 input_3[0][0]
lambda_16 (Lambda) (None, 100, 100, 3, 0 input_4[0][0]
model_3 (Model) (None, 1) 61576342 lambda_1[0][0]
lambda_2[0][0]
lambda_3[0][0]
lambda_4[0][0]
lambda_5[0][0]
lambda_6[0][0]
lambda_7[0][0]
lambda_8[0][0]
lambda_9[0][0]
lambda_10[0][0]
lambda_11[0][0]
lambda_12[0][0]
lambda_13[0][0]
lambda_14[0][0]
lambda_15[0][0]
lambda_16[0][0]
yolo_loss (Concatenate) (None, 1) 0 model_3[1][0]
model_3[2][0]
model_3[3][0]
model_3[4][0]
Total params: 61,576,342
Trainable params: 32,310
Non-trainable params: 61,544,032
I followed your example @nakasu , and it got rid of the error messages. However, I don't really see an improvement in batch speed. Were you able to get significant speed improvements with these? Wonder if I need to cchange the generator in someway to feed the data appropriately.
This is what I get when I call for parallel_model.summary() on gpus=4. It seems not correct as an input layer is split weirdly in 4 input layers..
parallel_model.summary()
Layer (type) Output Shape Param # Connected to input_1 (InputLayer) (None, None, None, 3 0 input_2 (InputLayer) (None, 25, 25, 3, 6) 0 input_3 (InputLayer) (None, 50, 50, 3, 6) 0 input_4 (InputLayer) (None, 100, 100, 3, 0 lambda_1 (Lambda) (None, None, None, 3 0 input_1[0][0] lambda_2 (Lambda) (None, 25, 25, 3, 6) 0 input_2[0][0] lambda_3 (Lambda) (None, 50, 50, 3, 6) 0 input_3[0][0] lambda_4 (Lambda) (None, 100, 100, 3, 0 input_4[0][0] lambda_5 (Lambda) (None, None, None, 3 0 input_1[0][0] lambda_6 (Lambda) (None, 25, 25, 3, 6) 0 input_2[0][0] lambda_7 (Lambda) (None, 50, 50, 3, 6) 0 input_3[0][0] lambda_8 (Lambda) (None, 100, 100, 3, 0 input_4[0][0] lambda_9 (Lambda) (None, None, None, 3 0 input_1[0][0] lambda_10 (Lambda) (None, 25, 25, 3, 6) 0 input_2[0][0] lambda_11 (Lambda) (None, 50, 50, 3, 6) 0 input_3[0][0] lambda_12 (Lambda) (None, 100, 100, 3, 0 input_4[0][0] lambda_13 (Lambda) (None, None, None, 3 0 input_1[0][0] lambda_14 (Lambda) (None, 25, 25, 3, 6) 0 input_2[0][0] lambda_15 (Lambda) (None, 50, 50, 3, 6) 0 input_3[0][0] lambda_16 (Lambda) (None, 100, 100, 3, 0 input_4[0][0] model_3 (Model) (None, 1) 61576342 lambda_1[0][0] lambda_2[0][0] lambda_3[0][0] lambda_4[0][0] lambda_5[0][0] lambda_6[0][0] lambda_7[0][0] lambda_8[0][0] lambda_9[0][0] lambda_10[0][0] lambda_11[0][0] lambda_12[0][0] lambda_13[0][0] lambda_14[0][0] lambda_15[0][0] lambda_16[0][0] yolo_loss (Concatenate) (None, 1) 0 model_3[1][0] model_3[2][0] model_3[3][0] model_3[4][0] Total params: 61,576,342 Trainable params: 32,310 Non-trainable params: 61,544,032
same here. make 4 gpus running but no gain in training time.
add multi_gpu_model
on yolo_body (before yolo_loss) may work
model_body = tiny_yolo_body(image_input, num_anchors//2, num_classes)
if gpus > 1:
model_body = multi_gpu_model(model_body, gpus=gpus)
I can confirm that @nakasu 's solution works (assuming you've applied this)
@leirobertshi , @mazatov did you increase the batch size to account for your number of gpus?
if gpus > 1:
batch_size *= gpus
I can confirm that @nakasu 's solution works (assuming you've applied this)
@leirobertshi , @mazatov did you increase the batch size to account for your number of gpus?
if gpus > 1: batch_size *= gpus
Thanks, I'll try this out. I don't think I was changing the batch_size!
@nakasu Although this method worked, however I don't think this solution work in a correctly way. Because multi_gpu_model concat the loss together which calculated by each sub batchsize, and in the model compile step, the loss function only select the first sub batchsize's loss, and ignored the others.
@FlyEgle I tried your method on docker, but I dont know what the reason caused the program junk at the epoch 1, until the pip out of time, no any response.
when i try to use
model = multi_gpu_model(model,gpus=3)
in my data,there is a error occured:tensorflow.python.framework.errors_impl.InvalidArgumentError: Can't concatenate scalars (use tf.stack instead) for 'yolo_loss_1/concat' (op: 'ConcatV2') with input shapes: [], [], [], [].
my enviroment is tensorflow-1.8 gpu, keras 2.20,titan xp please help me fix it! thx!
my enviroment is tensorflow-1.8 gpu, keras 2.20, nvidia V100 . but i can not run on GPU even though the GPU memory has been taken up. by the way it can run on cpu.
so,I want to know if you can run on GPU(tensorflow-1.8 gpu, keras 2.20).
thanks!
@power630 Hi, I tried the method you said, it works for me. thank you!
add
multi_gpu_model
on yolo_body (before yolo_loss) may workmodel_body = tiny_yolo_body(image_input, num_anchors//2, num_classes) if gpus > 1: model_body = multi_gpu_model(model_body, gpus=gpus)
@power630 Hi, I tried the method you said, it works for me. thank you!
add
multi_gpu_model
on yolo_body (before yolo_loss) may workmodel_body = tiny_yolo_body(image_input, num_anchors//2, num_classes) if gpus > 1: model_body = multi_gpu_model(model_body, gpus=gpus)
Unfortunately, it only works in training stage. The final weights cannot be loaded directly by model.load_weights
... @zhuolyang
add
multi_gpu_model
on yolo_body (before yolo_loss) may workmodel_body = tiny_yolo_body(image_input, num_anchors//2, num_classes) if gpus > 1: model_body = multi_gpu_model(model_body, gpus=gpus)
Does it mean do not use multi_gpu_model
on yolo_loss
?
add
multi_gpu_model
on yolo_body (before yolo_loss) may workmodel_body = tiny_yolo_body(image_input, num_anchors//2, num_classes) if gpus > 1: model_body = multi_gpu_model(model_body, gpus=gpus)
Does it mean do not use
multi_gpu_model
onyolo_loss
?
yes, the complete network contains multiple bodies and single loss. It works in training process. However, the saved model CANNOT be load by model.load_weights
directly.
@power630 Can not load the model is a big problem. Is there any method to load and save model?
@power630 Can not load the model is a big problem. Is there any method to load and save model?
i've not solved yet
i find this
maybe help
https://www.bountysource.com/issues/60494331-the-multi_gpu_model-problem
https://www.bountysource.com/issues/60494331-the-multi_gpu_model-problem Open it with a browser, but click it directly doesn't seem to work
@pzxdd @tanakataiki @nakasu @boyliwensheng @ladybirdhui this YOLOv3 tutorial may help you: https://github.com/ultralytics/yolov3/wiki/Train-Custom-Data
The accompanying repository works on MacOS, Windows and Linux, includes multigpu and multithreading, performs inference on images, videos, webcams, and an iOS app. It also tests to slightly higher mAPs than darknet, including on the latest YOLOv3-SPP.weights
(60.7 COCO mAP), and offers the ability to train custom datasets from scratch to darknet performance, all using PyTorch :)
https://github.com/ultralytics/yolov3
|
---|
I have pushed the useable code for training multi_gpu_model for YOLOv3 with multiple backbones, please visit https://github.com/anvien/Multi-YOLOv3
@nakasu Although this method worked, however I don't think this solution work in a correctly way. Because multi_gpu_model concat the loss together which calculated by each sub batchsize, and in the model compile step, the loss function only select the first sub batchsize's loss, and ignored the others.
Why do you think model.compile will select only the first sub batchsize's loss? I assume that from multi_gpu_model we will receive aggregated and averaged output, or aggregation will happen in model.compile?..
when i try to use
model = multi_gpu_model(model,gpus=3)
in my data,there is a error occured:my enviroment is tensorflow-1.8 gpu, keras 2.20,titan xp please help me fix it! thx!