Open kessler-frost opened 6 years ago
Hi @kessler-frost
In our experiments, optimizing C involves optimizing for ~ 1000 parameters, so a GPU would not add significant value, while using 64-CPUs to train C in parallel added lots of value. In terms of cost, renting 64-core CPUs on Google Cloud is roughly the same price as renting a single GPU. Training C can take days or weeks depending on how well we want to do so if you want to train C with parallel GPUs the cost will add up.
Meanwhile, GPUs were used to train V and M (on a single GPU virtual machine), in less than a day.
Thank you for the explanation!
I searched around for a while but couldn't find an understandable reason for using CPU cores for CMA Evolution Strategy when it is well known that GPUs perform much better when matrices and high levels of parallelism are involved. So, my question is why did you use CPUs instead of GPUs for training the C model when any way you utilised GPUs in V and M models? Pardon if this question seems too naive. Thanks.