Open meneguzzi opened 5 years ago
Hi @meneguzzi,
Thanks for letting me know that.
I haven't tested tf-plan in a GPU-based platform yet. So, to be honest, at this point, I can only speculate on the reason why the times are so different.
I don't know if you are relying on automatic device placement, but if that's the case, it might be possible that some computations being carried out during tf-plan training are simply not efficient to be executed in GPUs. Please note that the computations in tf-plan are domain-dependent and therefore potentially very different in nature and structure from a typical NN inference/training computations.
Eventually, it would be nice to profile the tf-plan's execution w/ and w/o GPUs and compare/visualize them in tensorboard to try to pinpoint which parts of the graph are taking more time on a GPU.
Unfortunately, I won't have time to do this anytime soon. But let me know if are willing to invest some time and need any help or if you have any other suggestion or consideration.
Hi @thiagopbueno,
I'm also working with @ramonpereira and @miquelramirez and I have been trying to run tf-plan in a Linux box with GPUs. However, in our experiments (the same domains as in issue #2 ), it seems that running the planner with tensorflow-gpu installed instead of plain tensorflow takes substantially longer.
In the example running
I have two different times at the end of the process, depending on whether I'm running with GPUs or not. First, with GPUs (either 1 or 3 GPUs does not change the times substantially)
Whereas the times for running with CPUs are:
This is really weird to me, as the GPU time is almost 3 times slower than running on CPUs.
I don't know if I would classify this as a bug, I would think this is a call for enhancement. Given that many of the domains one would use RDDL to solve are very complex, it would be great to be able to leverage Tensorflow's GPU speed up to the max.