Closed mtewes closed 7 years ago
I started working on this, but will take care of #18 at the same time. The purpose of all this is to see why the current bfgs gets stuck in "obviously bad" solutions.
Now restructuring the training, to support the choice of algorithm (or several, in fact) at a high level (megalut wrapper).
@kuntzer I've made PRs for both tenbilac and megalut. They go together (you need both), but I guess that's ok given our current stage of experimentation.
Okay, I'm looking at it now
Merged. It's still a mess to pass arguments through the megalut tenbilacwrapper, but at least this makes it cleaner to write e.g. your own hard-code optimizer (for a specific task) and using it.
I have in mind something brute force, that respects the idea that our networks are small, start with something already reasonable, and that "sparse" changes can have a huge effect. Example: loop through all params, try out -1, -0.5, +0.5, +1, and then perform a single step changing just this parameter.