Closed yallup closed 1 year ago
Ahh thanks David! Both of these things looks super helpful. I did play around with jit compile a bit as I had seen it lead to improvements with some other work I had been doing but just hadn't got around to implementing it here. Should help solve #20.
This shouldn't be a breaking change but we will have to bump the version number up to 0.6.0.
@yallup need to add tqdm to the requirements.txt for the tests I think.
Hmm yes the numpy to tf conversions probably aren't helping with runtime. I think speed up from GPUs is limited for small models as well because there is an overhead to running on GPU.
Hi @yallup, have you had a chance to look at this? Be great to get these features merged in. Thanks!
For some reason, the clustering is causing a
ValueError: tf.function only supports singleton tf.Variables created on the first call. Make sure the tf.Variable is only created once or created outside tf.function. See https://www.tensorflow.org/guide/function#creating_tfvariables for more information.
error with the @tf.function(jit_compile=True)
. Happens on the test_maf_clustering()
test which is the only one that has a clustering=True
in the MAF(). If I turn this of the test passes.
I think it's because tensorflow is trying to create the function multiple times... in the for loop
for i in range(len(self.theta)):
self.maf[i] = self._training(self.theta[i],
self.sample_weights[i],
self.maf[i], self.theta_min[i],
self.theta_max[i])
which I can validate by running
self.maf[0] = self._training(self.theta[0],
self.sample_weights[0],
self.maf[0], self.theta_min[0],
self.theta_max[0])
self.maf[1] = self._training(self.theta[1],
self.sample_weights[1],
self.maf[1], self.theta_min[1],
self.theta_max[1])
which gives the same error when it gets to self.maf[1].
@yallup I've been trying to resolve this issue but couldn't do it. I'm keen to take advantage of the tqdm addition though so I have removed the @tf.function(jit_compile=True)
decorator on the training step for now.
I will come back to this when I look at #31 again. I think I has the same error with the clustering and @tf.function decorator there as well. If the tests pass could you please squash and merge. Thanks!
Patch coverage: 100.00%
and project coverage change: +0.02%
:tada:
Comparison is base (
ae63d2a
) 88.93% compared to head (b34b8f9
) 88.95%.
:exclamation: Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Totally agree on putting a pin in the speedup stuff, it was just a little thing I was trying. I got a bit caught up trying out some other bits such as reducing translations between tf and numpy operations as that might be a bigger speedup in the long run, one to put a pin in and revisit I think!
No worries! Thanks for having a look at it. I started the process of rewriting the MAF class line by line this evening to try and eliminate some of the back and forth between numpy/tensorflow and to add in some tf.function decorators where possible. Seems to be going well and hopefully that will help with run time issues. I'll put a PR in at some point.
A small quality of life request in a progress bar for training.
A second small ( but potentially breaking change?) in wrapping the train step in a jit wrapper. Naively gives a factor 2 improvement in run time on CPU and GPU as far as I can see, but may need more advanced checking on other hardware.
More broadly I tried putting the training on a GPU and saw little speed up, I started digging through the code and I think there are quite a lot of conversions to numpy and non-TF operations. So at the moment there isn't much gain from GPU usage here, but the JIT compilation still looks advantageous.