Closed Da1sypetals closed 1 day ago
I guess all elements participating computation, whether or not it is differentiated, (except compile time constants), should be passed in as arguments? Correct me if I am wrong.
edit: Seems not even compile time constants... and operators even consumes operands, so how is differentiation computed? if a SVector<D, 12>
is cloned and used for multiple computations, is all gradients propagate to the graditent of input?
Hey there, thank you for the question. Not every operation for all combinations of dual numbers are implemented. That is not really feasible in Rust's type system. You can either pass all inputs as dual numbers
pub fn kinetic<D: DualNum<f64> + Copy>(
dq: &SVector<D, 12>,
dt: D,
mass: D,
q: SVector<D, 12>,
) -> D {
// no ext force
let qtilde = q + dq * dt;
let dq = q - qtilde;
(dq.transpose() * mass * dq)[(0, 0)] * 0.5
}
or convert them on the fly
pub fn kinetic<D: DualNum<f64> + Copy>(
dq: &SVector<f64, 12>,
dt: f64,
mass: f64,
q: SVector<D, 12>,
) -> D {
// no ext force
let qtilde = q + (dq * dt).map(D::from);
let dq = q - qtilde;
(dq.transpose() * D::from(mass) * dq)[(0, 0)] * 0.5
}
depending on what is more convenient (mostly whether you use the same variables in some other context again).
Note that I added the Copy
trait bound which makes the code look much more as it would with f64
as data type but restricts you to calling the function with statically sized dual numbers. Also because it is not possible to overload operators for builtins like f64
, I needed to swap the order of the multiplication in the last row.
Edit: It is possible to implement impl Mul<Dual> for f64
but it is afaik not really feasible to add that to the DualNum
trait.
How does Copy
trait behave? Is all computations of copies of input accounted to the gradient of the input? I cannot quite convey thru words...see this pic
(Note the red question marks)
Thanks a lot!
AD with (generalized) dual numbers is forward mode AD, so every variable stores information about its derivatives with respect to whatever variable you want to calculate derivatives for. All the copies of a are completely identical, just as if you were copying an f64
.
So it is like "every copy operation creates a new node, whose derivative info will propagate back" right? I am not familiar with (AD with (generalized) dual numbers)... seems I need to google it and learn
In forward AD there is no computational graph and no backpropagation. Broadly speaking you want to use forward AD when you have few inputs and arbitrarily many outputs and reverse AD (backpropagation) when you have many inputs. The construction of the computational graph always adds additional effort therefore the advantage of reverse AD kicks in at high number of inputs, what exatly high is depends very much on the implementation and potentially the hardware, could be around 10 vairables, or could be around 100. Because the Rust compiler optimizes all the execution paths for different derivatives at compile time, from my own experience, you can get very performant code even for a decent number of variables.
It seems I have mixed things up... Thanks a lot for explanations and I will have further check about forward AD, I'm new to it 😭
I tried to write code like this:
where ab.dq is a
DVec<f64, 12>
in nalgebra andab.dt
is simply a f64. It gives me compile error:Which makes no sense to me. I browse the docs and did not see where this kind behavior is documented, so it will be very helpful if there is some specification about what kind of things are valid and what not. Thanks!