Closed haamoon closed 6 years ago
Hi Haamoon,
I believe the issue is caused by the (different) random initialization of the model weights. Each time you run either run_fwd
or run
the model parameters are initialized according to the randomized initializer of tenosrflow.contrib.layers.fully_connected
(by default I belive should be xavier_initializer). This makes the hypergradients different (as they should be). If you try to initialize the parameters (weights of the network) with a constant value the issue should be solved. ( I did not write the code with the idea of using them together, so it is possible that some axiliary variable is wrongly reinitialized when you call the other method. I will check!)
An other thing.. by calling hypergradient.hgrads_hvars
new nodes are added to the graph to make the final computations for calculating the gradients, which is not necessary since they were already created in the HyperOptimizer.minimize
function. To retrieve a list of hypergradients use far.hypergradients()
.
Let me know if this helps!
Cheers, Luca
You are right, with a fix initialization the gradients are the same and I could access hyper-gradients with far.hypergradients method. Thanks for the help Luca!
Hi,
I wrote the following code to compare the hyper-gradient computed by ReverseHG and ForwardHG methods in the same file:
They receive identical inputs and compute the hyper-gradient for the same hyper-variable (_skip_hyper_ts=True so the hyper-parameter remains unchanged) but for some reason their output is quiet different. If noticed that if I run them in separate files (with a fix random seed) or run ForwardHG block before ReverseHG block their outputs would be similar. I can not see how Reverse and Forward hyper-gradient computation can effect each other as they don't share any variable. Could you please explain how these two methods can be run in the same file?
I have also attached the complete python code for this experiment.
cp.py.zip