Open adler-j opened 8 years ago
I strongly advise we use the notion of Fréchet derivative and ensure the result is a linear operator. Since Fréchet differentiability implies Gâteaux differentiability, and the Fréchet and Gâteaux derivatives agree, the difference is when an operator is not Fréchet differentiable.
Well what If the user wants to differentiate something like the absolute value function? Should we strictly disallow that, or should we allow the Gateux derivative in this case?
Well what If the user wants to differentiate something like the absolute value function? Should we strictly disallow that, or should we allow the Gateux derivative in this case?
Is there a method, like an optimization scheme, that is based on the Gâteaux derivative and not on the Fréchet derivative? What I mean is, what is the use case where you need a Gâteaux derivative of a non-Fréchet differentiable operator?
Old topic. In general, the Gâteaux derivative is calculable, but Fréchet is not. We need to calculate Gâteaux to get Fréchet if it is Fréchet differentiable. From numerical side, the Gâteaux is more straightforward and is always what we do.
Is there a method, like an optimization scheme, that is based on the Gâteaux derivative and not on the Fréchet derivative? What I mean is, what is the use case where you need a Gâteaux derivative of a non-Fréchet differentiable operator?
Well easiest example is simply solving something like
min_x ||Ax-b||_2^2 + ||x||_1
with steepest descent and line search. Sure you expect it to be bad, but it would be nice if it at least runs.
Actually agree with @chongchenmath . I was actually thinking of the Gâteaux derivative when thinking of how the notion of derivative is implemented, that was clumsy of me. Clearly, a numerical implementation would follow the Gâteaux derivative, so even though it may lead to non-linear operators for non-Fréchet differentiable operators, we should go with the Gâteaux derivative. This brings up the issue of ODL automatically considering the derivative as a linear operator. It would be correct to do that whenever the operator is Fréchet differentiable, but as with the case of the 1-norm that @adler-j provided, there are exceptions. Don't have a good suggestion here, but I would really try to avoid having two notions of derivative in ODL.
This brings up the issue of ODL automatically considering the derivative as a linear operator. It would be correct to do that whenever the operator is Fréchet differentiable, but as with the case of the 1-norm that @adler-j provided, there are exceptions. Don't have a good suggestion here, but I would really try to avoid having two notions of derivative in ODL.
ODL itself has no opinion on linearity of derivatives. If you implement a new operator and its derivative, the derivative can return any operator, linear or non-linear. ODL won't complain.
Of course, any downstream code that silently assumes that the derivative is linear (e.g. the typical op.derivative(x).adjoint
) will fail, hopefully with a good error message.
So actually there is no need for action, apart from being more clear in the documentation, and apart from the diagnostics module, where currently an error is raised when a derivative is not linear (I think).
I guess this is an "eternal issue" and can be closed since discussion seems to have stopped?
It would be nice to, in some sense, conclude the discussion before closing it. We consider Gateaux derivatives, and therefor should remove all tests checking that derivative operators are linear?
The online documentation says the following right now:
The Gateaux derivative
The concept of directional derivative can also be extended to Banach spaces, giving the Gateaux derivative. The Gateaux derivative is more general than the Fréchet derivative, but is not always a linear operator. An example of a function that is Gateaux differentiable but not Fréchet differentiable is the absolute value function. For this reason, when we write “derivative” in ODL, we generally mean the Fréchet derivative, but in some cases the Gateaux derivative can be used via duck-typing.
It would be nice to, in some sense, conclude the discussion before closing it. We consider Gateaux derivatives, and therefor should remove all tests checking that derivative operators are linear?
That's a good point, the diagnostics shouldn't flag those things as errors, maybe downgrade the message to "info". Apart from that, I think we're largely done with the discussion.
Isn't this largely settled now?
Well, there's still the issue of overly strict checks in the diagnostics, and probably some text references to Frechet derivatives. Either we make a new issue to fix that, or keep this one open (and change the title).
Changed the title
I changed the title since #1324 addresses the documentation part.
As noticed in PR #668, we don't have a definition of derivative that we use consequently throughout the code. In many (most?) places we mean the Frechet derivative (linear approximation), but in some places we mean the more general Gateux derivative (directional derivative).
How do we want to deal with this? We basically have two options:
op.derivative
are linear, which implies that we only use the Frechet derivative.op.derivative
to be non-linear, i.e. allow the Gateux derivative.