itt-ustutt / num-dual

Generalized (hyper-) dual numbers in rust
Apache License 2.0
60 stars 7 forks source link

How to use this crate? #84

Closed Da1sypetals closed 1 day ago

Da1sypetals commented 4 days ago

I tried to write code like this:

pub fn kinetic<D: DualNum<f64>>(ab: &AffineBody, q: SVector<D, 12>) -> D {
    // no ext force
    let qtilde = q + ab.dq * ab.dt;
    let dq = q - qtilde;    
    (0.5 * (dq.transpose() * ab.mass * dq)[(0, 0)]).into()
}

where ab.dq is a DVec<f64, 12> in nalgebra and ab.dt is simply a f64. It gives me compile error:

error[E0277]: cannot add `Matrix<f64, Const<12>, Const<1>, ArrayStorage<f64, 12, 1>>` to `Matrix<D, Const<12>, Const<1>, ArrayStorage<D, 12, 1>>`
  --> src/affinebody/body.rs:29:20
   |
29 |     let qtilde = q + ab.dq * ab.dt;
   |                    ^ no implementation for `Matrix<D, Const<12>, Const<1>, ArrayStorage<D, 12, 1>> + Matrix<f64, Const<12>, Const<1>, ArrayStorage<f64, 12, 1>>`
   |
   = help: the trait `Add<Matrix<f64, Const<12>, Const<1>, ArrayStorage<f64, 12, 1>>>` is not implemented for `Matrix<D, Const<12>, Const<1>, ArrayStorage<D, 12, 1>>`
help: consider introducing a `where` clause, but there might be an alternative better way to express this requirement
   |
27 | pub fn kinetic<D: DualNum<f64>>(ab: &AffineBody, q: SVector<D, 12>) -> D where Matrix<D, Const<12>, Const<1>, ArrayStorage<D, 12, 1>>: Add<Matrix<f64, Const<12>, Const<1>, ArrayStorage<f64, 12, 1>>> {
   |                                                                          +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Which makes no sense to me. I browse the docs and did not see where this kind behavior is documented, so it will be very helpful if there is some specification about what kind of things are valid and what not. Thanks!

Da1sypetals commented 4 days ago

I guess all elements participating computation, whether or not it is differentiated, (except compile time constants), should be passed in as arguments? Correct me if I am wrong.

edit: Seems not even compile time constants... and operators even consumes operands, so how is differentiation computed? if a SVector<D, 12> is cloned and used for multiple computations, is all gradients propagate to the graditent of input?

prehner commented 4 days ago

Hey there, thank you for the question. Not every operation for all combinations of dual numbers are implemented. That is not really feasible in Rust's type system. You can either pass all inputs as dual numbers

pub fn kinetic<D: DualNum<f64> + Copy>(
    dq: &SVector<D, 12>,
    dt: D,
    mass: D,
    q: SVector<D, 12>,
) -> D {
    // no ext force
    let qtilde = q + dq * dt;
    let dq = q - qtilde;
    (dq.transpose() * mass * dq)[(0, 0)] * 0.5
}

or convert them on the fly

pub fn kinetic<D: DualNum<f64> + Copy>(
    dq: &SVector<f64, 12>,
    dt: f64,
    mass: f64,
    q: SVector<D, 12>,
) -> D {
    // no ext force
    let qtilde = q + (dq * dt).map(D::from);
    let dq = q - qtilde;
    (dq.transpose() * D::from(mass) * dq)[(0, 0)] * 0.5
}

depending on what is more convenient (mostly whether you use the same variables in some other context again).

Note that I added the Copy trait bound which makes the code look much more as it would with f64 as data type but restricts you to calling the function with statically sized dual numbers. Also because it is not possible to overload operators for builtins like f64, I needed to swap the order of the multiplication in the last row.

Edit: It is possible to implement impl Mul<Dual> for f64 but it is afaik not really feasible to add that to the DualNum trait.

Da1sypetals commented 4 days ago

How does Copy trait behave? Is all computations of copies of input accounted to the gradient of the input? I cannot quite convey thru words...see this pic image (Note the red question marks) Thanks a lot!

prehner commented 4 days ago

AD with (generalized) dual numbers is forward mode AD, so every variable stores information about its derivatives with respect to whatever variable you want to calculate derivatives for. All the copies of a are completely identical, just as if you were copying an f64.

Da1sypetals commented 4 days ago

So it is like "every copy operation creates a new node, whose derivative info will propagate back" right? I am not familiar with (AD with (generalized) dual numbers)... seems I need to google it and learn

prehner commented 4 days ago

In forward AD there is no computational graph and no backpropagation. Broadly speaking you want to use forward AD when you have few inputs and arbitrarily many outputs and reverse AD (backpropagation) when you have many inputs. The construction of the computational graph always adds additional effort therefore the advantage of reverse AD kicks in at high number of inputs, what exatly high is depends very much on the implementation and potentially the hardware, could be around 10 vairables, or could be around 100. Because the Rust compiler optimizes all the execution paths for different derivatives at compile time, from my own experience, you can get very performant code even for a decent number of variables.

Da1sypetals commented 4 days ago

It seems I have mixed things up... Thanks a lot for explanations and I will have further check about forward AD, I'm new to it 😭