Question about empirical gradient calculation

irashavitt / regularization_learning_networks

Implementation of RLNs, as described in https://arxiv.org/abs/1805.06440

GNU General Public License v3.0

35 stars 6 forks source link

Hello,

@avsecz and I saw your poster at ICML WCB and we were quite impressed by the work. However, we have a question about this line in the code:

https://github.com/irashavitt/regularization_learning_networks/blob/7b35e348017f7b837da579dbf097a9fc1de648c1/Implementations/Keras_implementation.py#L36

It seems that you are estimating the gradients empirically by looking at the difference in the weights between the current timestep and the previous timestep. Wouldn't this give you an estimate of the gradient w.r.t. a loss that has the regularization term included? From your paper, it seems clear that the counterfactual loss does not contain the regularization term, so we are wondering how you obtain an estimate of the gradient when the regularization term is excluded. For ease, the relevant formula from your paper is included below:

Hi @AvantiShri and @Avsecz!

Sorry for the late response, I had an old email configured with Github...

You are correct, the Counterfactual loss does not contain a regularization term. But in this implementation, the Keras model isn't passed a regularization term at all, so the gradients variable is the gradient of the empirical loss, without any regularization. You can see the use in the Keras tutorial at: https://github.com/irashavitt/regularization_learning_networks/blob/master/Implementations/Keras_tutorial.ipynb

The application of regularization is done through the RLNCallback, which reduces the sizes of the weights. https://github.com/irashavitt/regularization_learning_networks/blob/7b35e348017f7b837da579dbf097a9fc1de648c1/Implementations/Keras_implementation.py#L57-L60

It's a suboptimal implementation, and I hope to have a cleaner version, where the regularization terms are passed to the framework. It might need to .be in a more expressive framework than Keras.

I hope my late response did not discourage you from using RLNs! I should get notifications now, and in any case, feel free to email me if you have any question!

Ira.

irashavitt / regularization_learning_networks

Question about empirical gradient calculation #1