htjb / margarine

Code to replicate posterior probability distributions with bijectors/KDEs and perform marginal KL/bayesian dimensionality calculations.
MIT License
13 stars 8 forks source link

change mean to sum #10

Closed BeichenXu closed 1 year ago

BeichenXu commented 2 years ago

I changed the tf.reduce_mean to tf.reduce_sum in the function to calculate loss in _train_step function in maf.py, so that the loss would be around 1e0 to 1e1, instead of small number. Since the magnitudes of probabilities depend on the number of data. I have attached the graphs of the plots pf different losses and the gifs of how the margarine train with different function.

loss_sum_500.pdf train_loss.pdf 300_horizontal_new 300_horizontal

williamjameshandley commented 2 years ago

Cracking plots @BeichenXu. Are these started from the same seed? The loss function trace looks quantitatively different between the two cases if so.

htjb commented 2 years ago

@BeichenXu these are great graphs! Directly comparing loss functions isn't always the best thing to do however because they might be normalised differently or in different "units" or "spaces" if that makes sense? Like comparing apples and bananas...

If we say that 'reduce_sum' is Rs and 'reduced_mean' is Rm then to compare the two I think you should divide the loss Rs/N where N is the number of samples when plotting. Alternatively you can look at something which is independent of the loss and compare that. For example how close is the KL divergence to the "true value" from 'anesthetic' say or the values in @williamjameshandley 'quantifying tensions/dimensionalities...' papers? Does one loss function give a better estimate of KL than the other?

In any case for this PR it would be good to add in an optional kwarg for the loss function and set the default value to Rs or Rm (whichever appears to be better) and allow the user to change to the other if they like? What does @williamjameshandley think?

htjb commented 2 years ago

Ohh and I agree with @williamjameshandley that fixing the random seed is important! These are ace graphs though🙂

BeichenXu commented 2 years ago

Cracking plots @BeichenXu. Are these started from the same seed? The loss function trace looks quantitatively different between the two cases if so.

Yes, these plots are from same seeds, I set np.random.seed(0) and tf.random.set_seed(0). I still try to investigate the reason.

BeichenXu commented 2 years ago

I changed the gifs to the version which generates uniform distribution first instead of sampling every time.

htjb commented 1 year ago

Hi @BeichenXu, in your PR could you add an additional argument to the function train() in maf.py such as loss_type and set it as a default to 'sum' please? i.e. train(self, epochs=100, early_stop=False, loss_type='sum'). Then pass this additional argument to train_step() and make the following change to that function

with tf.GradientTape() as tape:
            if loss_type == 'sum':
                    loss = -tf.reduce_sum(w*self.maf.log_prob(x))
            elif loss_type == 'mean':
                    loss = -tf.reduce_mean(w*self.maf.log_prob(x))
            gradients = tape.gradient(loss, self.maf.trainable_variables)
            self.optimizer.apply_gradients(
                zip(gradients,
                    self.maf.trainable_variables))
            return loss

Then just add a note to the doc string in train() that loss_type is defaulted to 'sum' but can be changed. This will give people the option to train with the mean but default to the sum. Thanks in advance! Let me know if there are any issues. 😄

BeichenXu commented 1 year ago

Hi @htjb ,thank you for helping me. I have edited the maf.py and pushed again, please check again when you have time to see if there is anything that needs to be modified.

htjb commented 1 year ago

Thanks @BeichenXu, this is looking good. Just need to make sure that we pass the argument loss_type to the function train_step(). So we need to edit the line loss = self._train_step(phi, weights_phi).numpy() in train() to loss = self._train_step(phi, weights_phi, loss_type).numpy(). Then change def _train_step(self, x, w, prior_phi=None, prior_weights=None): to def _train_step(self, x, w, loss_type, prior_phi=None, prior_weights=None):. Should be good to go then! Thanks in adance.

BeichenXu commented 1 year ago

Thank you @htjb ! I am sorry that I didn't pass the argument, and I have edited maf.py again, please check.

htjb commented 1 year ago

No problem @BeichenXu, this looks good now thanks! Feel free to squash and merge the branch.

htjb commented 1 year ago

Thanks @BeichenXu! :smile: