I'm trying to use make_parallel() using Keras XCeption, and a generator which yields two classes, batch_size=2.
When using one gpu without make_parallel, the model gets to loss=0 acc=1 in 2 epochs.
However, when using multi_gpu with gpus=2, the model gets stuck in acc=0.5 with loss=8.0591.
I'm guessing this is related somehow to the loss aggregation being collected only from one GPU instead of both, but I am not sure why.
When trying to train 4 classes, batch_size=4, the training gets to acc=0.97 after 11 epochs, while single gpu gets acc=1 within 2 epochs.
I'm trying to use make_parallel() using Keras XCeption, and a generator which yields two classes, batch_size=2.
When using one gpu without make_parallel, the model gets to loss=0 acc=1 in 2 epochs. However, when using multi_gpu with gpus=2, the model gets stuck in acc=0.5 with loss=8.0591.
I'm guessing this is related somehow to the loss aggregation being collected only from one GPU instead of both, but I am not sure why.
When trying to train 4 classes, batch_size=4, the training gets to acc=0.97 after 11 epochs, while single gpu gets acc=1 within 2 epochs.
Any idea?