trthatcher / MLKernels.jl

Machine learning kernels in Julia.
http://trthatcher.github.io/MLKernels.jl/dev/
MIT License
78 stars 37 forks source link

Kernel Derivatives #46

Open trthatcher opened 7 years ago

trthatcher commented 7 years ago

There's two components to this enhancement.

Optimization

Define a theta and eta (inverse theta) function to transform parameters between an open bounded interval to a closed bounded interval (or eliminate the bounds entirely) for use in optimization methods. This is similar to how link functions work in logistic regression - unconstrained optimization is used to set a parameter value in the interval (0,1) using the logit link function.

Derivatives

Derivatives will be with respect to theta as described above.

kskyten commented 7 years ago

Sounds great! How can I help? Can also you explain what is the relation between this enhancement and the derivatives branch?

trthatcher commented 7 years ago

Hello!

Very early on there was an attempt at adding derivatives - that's the derivatives branch. However, this added a great deal of complexity. I didn't feel like the base Kernel type and calculation method was carefully planned out before building all this complexity on top. For example, there wasn't really any consideration for the parameter constraints and how that would impact the optimization routines (this can be an issue with open intervals such as the alpha parameter in a Gaussian Kernel - not all kernels can use an unconstrained optimization method).

I've since reworked much of the package and explored how other libraries approach derivatives. Rather than having the Kernel type be a collection of floats, I've now made it a collection of HyperParameter instances. This new HyperParameter type contains a pointer to a value that can be altered as well as an Interval type that can be used to transform the parameter to a domain more amenable to optimization and enforce constraints/invariants.

I'm almost done the changes I've outlined in the "Optimization" section. Unfortunately I need to finish that first since the derivatives have a few dependencies on those changes. Once that is complete, it will just be a matter of defining analytic derivatives for the parameters and a kernel/kernel matrix derivative. I can provide some more direction as soon as that done if you'd like to help. It will be a couple more days though

kskyten commented 7 years ago

Excellent! I would like to help with defining the analytical derivatives. It seems that some of them have already been done in the derivatives branch.

Should #2 be closed?

trthatcher commented 7 years ago

The optimization section is basically complete save for a few tests - so it's good enough to start on the derivatives. I've updated the original comment for some detail. I've also expanded the documentation here:

http://mlkernels.readthedocs.io/en/dev/interface.html

The Hyper Parameters section may be helpful.

If you'd like to add some derivative definitions and open a PR, feel free. You can probably grab a number of them from the derivatives branch (hopefully some reusable tests, too). If you're planning on working on this over the next couple days, I won't be working on anything but I'll try to answer any questions you have.