early nn stuff - Githubissues

jmsull commented 1 year ago

Round 1 of very simple Adam opt plots on cdm at fixed background:

delta and v deltac_learning_v1_multnoise0 1_Adam80_1 0

vc_learning_v1_multnoise0 1_Adam80_1 0

reconstructed delta', v' deltacprime_learning_v1_multnoise0 1_Adam80_1 0

vcprime_learning_v1_multnoise0 1_Adam80_1 0

jmsull commented 1 year ago

v and v' look pretty bad - lots of room to improve

Sorry title label on second to last plot is wrong, should say delta'

jmsull commented 1 year ago

BTW this is really long mode, $k \sim 0.003$

jmsull commented 1 year ago

Now training with 50 iters of Adam with $\eta=1$, followed by 50 with $\eta=0.1$, 20 with $\eta=0.01$, and 10 iters of BFGS (default hyperparameters) - it looks a little better, especially in the solutions, maybe not so much in the reconstructions of $u'$. deltac_learning_v1_multnoise0 1_Adam50_50_20_1 0_0 1_0 01_bfgs

vc_learning_v1_multnoise0 1_Adam50_50_20_1 0_0 1_0 01_bfgs

Reconstruction: deltacprime_learning_v1_multnoise0 1_Adam50_50_20_1 0_0 1_0 01_bfgs

vcprime_learning_v1_multnoise0 1_Adam50_50_20_1 0_0 1_0 01_bfgs

jmsull commented 1 year ago

The loss curve:

loss_learning_v1_multnoise0 1_Adam50_50_20_1 0_0 1_0 01_bfgs

It looks like maybe bfgs is starting to just turn down? But the BFGS iters are super expensive (I suppose due to Hessian approximation, even with forward diff, which I assume it is using for that). We should perhaps run this on something with more oomph than my laptop...

jmsull commented 1 year ago

Some other observations:

The solutions for $\delta$ and $v$ look way better with more optimization, which is encouraging
Both solutions are super wrong initially, which perhaps is an implementation error in taking out the neutrinos from the ICs for this simplified example? I will check on this
Otherwise, what the optimization is doing makes sense - it focuses on the last part of the evolution because the solution is biggest there, so it can afford to do much worse in the initial part of the evolution. We may want to try some of the scaling tricks we talked about today (that are also in the stiff neural ode paper) or something hackier
This step-y behavior in the $u'$ function is pretty interesting - here maybe I am not using enough weights - the input is $u$, which is of size 37 in this case, and I am only using a 37->8->8->2 network. Going wider will almost certainly help with this so I can try that.

jmsull commented 1 year ago

Another thing I'm eager to try is adding more data and batching over k, which will be closer to what we want to do eventually...

xzackli / Bolt.jl

early nn stuff #89