Missing twist in kernel definition (Chapter Kernel learning)

kjappelbaum commented 2 years ago

In the text you promise

I will add one small twist though: dividing by the dimension. This makes our kernel output magnitude is independent of the number of dimensions of x.

However, the definition is simply

def kernel(x1, x2):
    return jnp.sqrt(jnp.mean((x1 - x2) ** 2))

However, I think it would be better to switch to something like the dot product kernel (as the Euclidean distance violates Mercer's condition - I think also above, where you define the kernel, it might make sense to not use the Euclidean distance kernel as it is not 100% correct).

Also here, happy to make a PR.

kjappelbaum commented 2 years ago

(so what I personally do is that I define a simple RBF kernel with lengthscale 1 in the lecture where I do the live coding and then let the students play with the lengthscale in the exercise)

whitead commented 2 years ago

Hi! Yes, the kernel learning chapter is not in great shape. Happy to have a PR on this.

Yes, cosine distance would make much more sense.

I think the "twist" thing was the use of mean, but I don't remember where I got that from or if it works well.

I usually skip this chapter when teaching and just haven't looked at it much since I wrote it. Any help is appreciated!

whitead commented 1 year ago

Fixed by #228

whitead / dmol-book

Missing twist in kernel definition (Chapter Kernel learning) #198