raskr / rust-autograd

Tensors and differentiable operations (like TensorFlow) in Rust
MIT License
484 stars 36 forks source link

Support for lgamma function #43

Closed laocaoshilaocao closed 2 years ago

laocaoshilaocao commented 3 years ago

Hi, i am using autograd to make a neural network with custom defined loss function. That needs the calculation of lgamma function of tensors during the process. I notice that there is tf.math.lgamma() in tensorflow, and i find that would be really complex to define the operation myself using autograd since the gradient of the logarithm of the fraction is really complex for me to understand. Could you have any support for that operation or do you have any advice how can i achieve that? Thanks for your help.

raskr commented 3 years ago

Sorry I'm not familiar with those kind of functions. autograd doesn't support complex numbers, and is it possible to implement that (and its derivative)?

laocaoshilaocao commented 3 years ago

Thanks for your reply. For me only the float computation is enough so it doesn't need to be extended to complex numbers. Actually the core calculation is a integral, and i also i found that this function already exists in statrs::function::gamma::{ln_gamma} for float number. So i think the computation for that operation is possible, but it still looks unclear for me how to write the derivative based on that. Do you have any idea of that?

Another small problem is that when i tried to write the computation like this:

impl<T: ag::Float> ag::op::Op<T> for Op_test {
    fn compute(
        &self,
        ctx: &mut ag::op::ComputeContext<T>,
){
        let x: &ag::NdArrayView<_> = &ctx.input(0);
        let y = x.mapv(move |a| ln_gamma(a));
        ctx.append_output(y);
}

It says mismatched types for |a|, expected f64, found type parameter T. And i think the T is the thing i can't change right, so does that mean i can't directly use other float calculation crate inside the custom defined operation?

raskr commented 3 years ago

So i think the computation for that operation is possible, but it still looks unclear for me how to write the derivative based on that. Do you have any idea of that?

About this please give me some time.

It says mismatched types for |a|, expected f64, found type parameter T. And i think the T is the thing i can't change right, so does that mean i can't directly use other float calculation crate inside the custom defined operation?

Probably you can write like this:

impl ag::op::Op<f64> for Op_test {
    fn compute(
        &self,
        ctx: &mut ag::op::ComputeContext<f64>,
){
        let x: &ag::NdArrayView<_> = &ctx.input(0);
        let y = x.mapv(move |a| ln_gamma(a));
        ctx.append_output(y);
}
laocaoshilaocao commented 3 years ago

Probably you can write like this

That works for me.

About this please give me some time.

Thanks a lot for your help, wait for your update :)

raskr commented 3 years ago

Found tensorflow's lgamma derivative implementation. Hmm looks like it requires digamma. https://github.com/tensorflow/tensorflow/blob/master/tensorflow/cc/gradients/math_grad.cc#L803

raskr commented 3 years ago

If we don't consider complex numbers, the gradient would be gx = gy * digamma(x). So we are going to have to implement Digamma Op and call it in the Lgamma's grad according to the above derivative.

Also found a Rusts's digamma impl: https://docs.rs/special/0.3.2/special/fn.digamma.html.

laocaoshilaocao commented 3 years ago

That looks helpful. I have had a try of this

    fn grad(&self, ctx: &mut ag::op::GradientContext<f64>) {

        let x = ctx.input(0);
        let gy = ctx.output_grad();
        let y = ctx.output();
        let gx = gy * digamma(x);
        ctx.append_input_grad(Some(gx));

    }

The issue is that the digamma function only supports float inputs, not the tensor . And i also find that there is no existing method to return an ndarray type input in ag::op::GradientContext Do u have any idea of that?

raskr commented 3 years ago

How about this? (no guarantee of correctness...)

use special::digamma as digamma_func;
use special::ln_gamma as lgamma_func;

// =======================================
// Defining digamma, which is the derivative of lgamma
// =======================================
struct Digamma;

impl<T: ag::Float> ag::op::Op<T> for Digamma {
    fn compute(
        &self,
        ctx: &mut ag::op::ComputeContext<T>,
    ) {
        let x: &ag::NdArrayView<_> = &ctx.input(0);
        let y = x.mapv(move |a| digamma_func(a));
        ctx.append_output(y);
    }

    fn grad(&self, ctx: &mut ag::op::GradientContext<T>) {
        // No problem with empty implementation this time.
        ctx.append_input_grad(None);
    }
}

use ag::tensor::Input;

// helper
fn digamma<'graph, F: ag::Float>(x: &ag::Tensor<'graph, F>, g: &'graph ag::Graph<F>)
-> ag::Tensor<'graph, F> {
    ag::Tensor::builder()
           .set_inputs(&[Input::new(x)])
           .build(g, Digamma)
}

// =============
// Defining lgamma
// =============
struct Lgamma;

impl<T: ag::Float> ag::op::Op<T> for Lgamma {
    fn compute(
        &self,
        ctx: &mut ag::op::ComputeContext<T>,
    ) {
        let x: &ag::NdArrayView<_> = &ctx.input(0);
        let y = x.mapv(move |a| lgamma_func(a).0);
        ctx.append_output(y);
    }

    fn grad(&self, ctx: &mut ag::op::GradientContext<T>) {
        let x = ctx.input(0);
        let gy = ctx.output_grad();
        let y = ctx.output();
        let gx = gy * digamma(x, ctx.graph());
        ctx.append_input_grad(Some(gx));
    }
}

// use this
fn lgamma<'graph, F: ag::Float>(x: &ag::Tensor<'graph, F>, g: &'graph ag::Graph<F>)
-> ag::Tensor<'graph, F> {
    ag::Tensor::builder()
           .set_inputs(&[Input::new(x)])
           .build(g, Lgamma)
}
laocaoshilaocao commented 3 years ago

How about this? (no guarantee of correctness...)

Thanks! That works for me after changing all type T to f64 in all methods since the lgamma_func(a).0 seems don't work for me.

I believe this problem solving can be a really good example of some mathematics custom defined tensor operation.

I also have another two related questions about the custom defined operation.

One is about the Pow operation between two tensors, i have tried to write the computation of that like this:

   fn compute(
        &self,
        ctx: &mut ag::op::ComputeContext<f64>,
    ) {
        let x: &ag::NdArrayView<_> = &ctx.input(0);
        let y: &ag::NdArrayView<_> = &ctx.input(1);
        let mut out = &x;
        Zip::from(&out)
            .and(&x)
            .and(&y)
            .apply(|o, &i_1, &i_2|{
                *o = f64::powf(i_1, i_2);
            });
        ctx.append_output(out);
    }

That has errors the trait ndarray::NdProducer is not implemented for &&&ndarray::ArrayBase<ndarray::ViewRepr<&f64>, ndarray::Dim<ndarray::IxDynImpl>>. That seems to be a weird error, do you have the idea of how can i iterate and do opeartions through the two ctx.input elements? As for the gradient of element pow, the gradient of g(x1) and g(x2) are gx1 = gy * x2 * (x1^(x2-1)), gx2 = gy * x1^x2 * lnx1 Am i correct in this part?

The second question is about the linear_diag operation. In tensorflow useing tf.linalg.diag(tf.linalg.diag_part(matrix)), i can create a new 2D matrix with only diagonal values from the original tensor. I want to achieve the same operation using Autograd

Since this kind of operation doesn't have gradient. What i was thinking is that i can mul the tensor with a unit diagonal tensor ([[1., 0., ... 0.], [0., 1., 0.,....0.], ..., [0., ... 0., 1.]]) with the same shape of original tensor, inspired by https://github.com/tensorflow/tensorflow/issues/1293.

However when i was trying to do that , i found that is hard to implement since the method g.shape() returns also a tensor and the g.constant() needs a NdArray whose shape should be a &[usize]. Do u have any idea how can i solve that problem?

Thanks a lot for your help !! (maybe i should change this Issue name to custom operation practice :)

laocaoshilaocao commented 3 years ago

Hi, inspired by your example before. I have tried to make the power between two tensors as followed:

//helper
struct Pow_h;

impl ag::op::Op<f64> for Pow_h {
    fn compute(
        &self,
        ctx: &mut ag::op::ComputeContext<f64>,
    ) {
        let x = &ctx.input(0);
        let y: &ag::NdArrayView<_> = &ctx.input(1);
        let mut out = x.mapv(move |a| a);
        for ((mut o, a), b) in out.iter_mut().zip(x.iter()).zip(y.iter()) {
            *o = f64::powf(*a,*b - 1.0);
        }
        ctx.append_output(out);
    }

    fn grad(&self, ctx: &mut ag::op::GradientContext<f64>) {
        ctx.append_input_grad(None);
    }
}

// helper
fn pow_h<'graph>(x: &Tensor<'graph>, y: &Tensor<'graph>, g: &'graph ag::Graph<f64>)
-> Tensor<'graph> {
    ag::Tensor::builder()
           .set_inputs(&[Input::new(x), Input::new(y)])
           .build(g, Pow_h)
}

struct Pow_e;

impl ag::op::Op<f64> for Pow_e {
    fn compute(
        &self,
        ctx: &mut ag::op::ComputeContext<f64>,
    ) {
        let x = &ctx.input(0);
        let y: &ag::NdArrayView<_> = &ctx.input(1);
        let mut out = x.mapv(move |a| a);
        for ((mut o, a), b) in out.iter_mut().zip(x.iter()).zip(y.iter()) {
            *o = f64::powf(*a,*b);
        }
        ctx.append_output(out);
    }

    fn grad(&self, ctx: &mut ag::op::GradientContext<f64>) {
        let x = ctx.input(0);
        let y = ctx.input(1);
        let z = ctx.output();
        let grad = ctx.output_grad();

        let gx = grad * y * pow_h(&x, &y, ctx.graph());
        let gy = grad * z * ctx.graph().ln(x);
        ctx.append_input_grad(Some(gx));
        ctx.append_input_grad(Some(gy));
    }
}

I have done some simple test for that code and it passes. However i noticed that in source you use the method maybereduce for binary ops, what is the point of using that?

laocaoshilaocao commented 3 years ago

As for the tensor.shape() problem, i think the core question is how can i use the value inside a tensor during the process? For example, in tensorflow i can use tensor.shape()[0] to have the value of the first dimension of that tensor. Is there any equivalence also in autograd ?

raskr commented 3 years ago

value of the first dimension of that tensor. Is there any equivalence also in autograd ?

You can use this: https://docs.rs/autograd/1.1.0/autograd/tensor/struct.Tensor.html#method.access_elem

laocaoshilaocao commented 3 years ago

I have tried the access_elem, but the issue i met is that i want to implement the element like this: tf.ones((1, self.mean.get_shape()[0]) However when i use g.ones(&[1, g.shape(mean).access_elem(0)]) that doesn't work since the access_elem still returns a tensor. That is also impossible for me to eval the value at this place. Do you have any idea how to solve that? Thanks for your time and help!

laocaoshilaocao commented 3 years ago

that doesn't work since the access_elem still returns a tensor.

The error message is mismatched types: expected integer, found struct autograd::Tensor

raskr commented 3 years ago

i noticed that in source you use the method maybereduce for binary ops, what is the point of using that?

You don't need maybereduce in the gradient computation because that is required only when broadcast occurs in the forward path.

That has errors the trait ndarray::NdProducer is not implemented for &&&ndarray::ArrayBase<ndarray::ViewRepr<&f64>, ndarray::Dim>. That seems to be a weird error, do you have the idea of how can i iterate and do opeartions through the two ctx.input elements?

Is that ok about you are performing inplace operation in the above example? I think you should allocate a new NdArray using uninit or something like x.mapv(move |a| a) you wrote.

since the method g.shape() returns also a tensor

Sorry it's hard to get the static shape of an autograd::Tensor because this lib doesn't support shape inference.

i can create a new 2D matrix with only diagonal values from the original tensor. I want to achieve the same operation using Autograd

Probably you have to implement a custom Op whose argument is a shape tensor like this. You may also need to use ndarray-linalg.

raskr commented 3 years ago

That works for me after changing all type T to f64 in all methods since the lgamma_func(a).0 seems don't work for me.

It may be faster to use f32 for the whole calculation and cast to f64 only before and after the lgamma_func calculation.

laocaoshilaocao commented 3 years ago

Thanks a lot for all your advice!!

Sorry it's hard to get the static shape of an autograd::Tensor because this lib doesn't support shape inference.

That is really a limitation for complex implementation. For example in batch training the training data size then must be the multiply times of batch size. However that is acceptable i believe.

I am going to work with all those problems :)

raskr commented 3 years ago

Is it undesirable to do this? (too slow?)

let first = shape.eval(&[]).unwrap()[0];
g.ones(&[1, first])

I recognize that even in TF, it is not always possible to get static shapes using get_shape.

laocaoshilaocao commented 3 years ago

Yes i also thought about doing in that way. However, in my situation the tensor shape has already included several placeholder parameters after several layers calculation. So if i do eval at that place, i believe that means i need to run the whole algorithm once, which is really inefficient. The situation is like:

let x = g.placeholder(&[-1, 1]);
let L1 = Dense_layer(&x, W1);
....(several layers and custom defined loss functions balabala)
let mean_layer = Dense_layer(&Ln, Wn);
let output = mean_layer * g.matmul(x, g.ones((1, mean.shape()[1]);

TF does the same thing like: self.output = self.mean * tf.matmul(self.x, tf.ones((1, self.mean.get_shape()[1]), dtype=tf.float32))

For me now, i think the practical alternative is to directly use the shape integer value reading from the algorithm.

raskr commented 3 years ago

i believe that means i need to run the whole algorithm once, which is really inefficient.

I see, that is a serious problem. I thought that exposing whole node's evaluation results after eval would be useful. I'll try to incorporate that into the next ver up.

laocaoshilaocao commented 3 years ago

I'll try to incorporate that into the next ver up.

Thanks, that will be cool :)

raskr commented 2 years ago

Implemented