simple-complexities / simple-complexities.github.io

Ameya's math and CS blog.
https://simple-complexities.github.io
3 stars 2 forks source link

Functional Gradient Descent #3

Open ameya98 opened 4 years ago

ameya98 commented 4 years ago

Comment on the article here, and submit it a PR if you find anything that should be changed!

fwhigh commented 3 years ago

Wonderful write-up, thank you!

ghost commented 3 years ago

Hi ameya98 There is a typo in section 'Functionals' where you define 'The sum functional'. the summation does not need dx because it is not an integration.

ghost commented 3 years ago

Hi again ameya98 There is another typo in the section 'Derivatives of Functionals'. check this sentence: "where Df(x) is a size n row vector, with which the take the dot product of the direction h with."

ghost commented 3 years ago

When calculating the derivative of E(f)=∥f∥^2, you end up E(f)+2f⋅h+h⋅h. How did you induce the derivative is 2f? I assume the derivative be of the form: E(f)+2f⋅h+h⋅h E(f)+(2f+h)⋅h Thus the derivative is (2f+h). Here I tried to create form E(x)+DE(x)⋅h as you introduced in your formula. presence of that h in (2f+h) is making the calculations irrational. Could you please tell me how did you calculate this 2f without any h?

ghost commented 3 years ago

In last section, how did you end up with the following formula? αft+1=2η(y−ft(x))+(1−2λη)αft

it is not similar to the steps of Gradient descent. I need some more explanation. In fact, the last section of the article needs a lot of explanation. There are a lot of gray areas.

ameya98 commented 3 years ago

Thanks @meam64 for identifying some typos. I'll fix those soon. For the calculation of the derivative, note that only the terms being convolved with h and not 'higher order' terms like h ⋅ h would be in the derivative. Note that in the limit as || h || -> 0, these 'higher order' terms go to 0, which is why we can ignore them.

As a parallel, think of the derivative of f(x) = x^2. Here also you would only take the linear terms to construct the derivative (2x, not 2x + h).

ghost commented 3 years ago

HI @ameya98 Thanks for the response. But, I need more information. In each of the 3 examples, you have taken different ways to find the derivatives. Are you calculating E(f+h) and then plug it into the formula with a limit? This is what I tried and the results are different than what you have written as an answer. Could you please describe more? I can easily think about scalar x and calculate derivatives. But I want you to provide me the correct way of solving the derivative of a functional, not just using the scalar formulas in this case. Moreover, could you please tell me what was your reference for writing all this? I need to read more about the topic. Thanks

ngiann commented 3 years ago

This article is really great, thanks@ameya98 for writing it!