100% original Adam that allows to train x2 or x3 bigger networks.

iperov commented 5 years ago

- What I did

added 100% original Adam optimizer with new option

tf_cpu_mode: only for tensorflow backend
0 - default, no changes.
1 - allows to train x2 bigger network on same VRAM consuming RAM
2 - allows to train x3 bigger network on same VRAM consuming RAM*2 and CPU power.

Batch size is very important parameter for GAN networks. So getting rid of optimizer's weights from VRAM, we can train higher batch size, sacrificing 10-20% of time per iteration.

- How I did it

accidentally discovered.

- How you can verify it

add tf_cpu_mode=1 to Adam and try x2 bigger network

RaphaelMeudec commented 5 years ago

@iperov Hello, thanks for your PR! However, I don't think this should be a keras-contrib feature. Fix for Adam optimizer should be directy integrated in Keras to avoid a duplicate between keras and keras-contrib. Please consider opening a PR on keras-team/keras !

iperov commented 5 years ago

Adam optimizer should be directy integrated in Keras

I dont think so, because Keras is a standard, that other frameworks should implement. Theano, plaidML, and others cannot implement the function of placing and working with tensors on CPU.

Close then.

RaphaelMeudec commented 5 years ago

Adam is already on keras here. Keras-contrib is just an extension of Keras, meant to test new features before eventually integrating them into keras.

keras-team / keras-contrib

100% original Adam that allows to train x2 or x3 bigger networks. #478