Incorrect gradient computation for non-zeroed derivatives

vaithak commented 2 months ago

Function 1:

double f1(double x) {
  double y = -x;
  return y;
}

Function 2:

double f2(double x) {
  x = -x;
  return x;
}

Both the functions are essentially doing f(x) = -x, so the df/dx = -1. If we don't zero out the derivative parameter when passing to the .execute method, the expectation is that it will accumulate the result, i.e. _d_param += dfdx. Doing so for the above functions results in different outputs. Code for reproducing the issue:

double f2(double x) {
  x = -x;
  return x;
}

int main() {
  double x = 1.0;
  double dx = 0;

  // test with f1
  auto f1_dx = clad::gradient(f1);
  f1_dx.execute(x, &dx); // test with zeroed dx
  show(dx);

  f1_dx.execute(x, &dx); // test with non-zeroed dx
  show(dx);

  // test with f2
  dx = 0;
  auto f2_dx = clad::gradient(f2);
  f2_dx.execute(x, &dx); // test with zeroed dx
  show(dx);

  f2_dx.execute(x, &dx); // test with non-zeroed dx
  show(dx);
  return 0;
}

Outputs:

dx = -1
dx = -2
dx = -1
dx = 0

vgvassilev commented 2 months ago

That is a feature, not a bug because often (in case of ROOT) we were told that is is with accumulating especially for vector values functions…

I agree it looks like a bug but I am not sure how we can improve the situation. Maybe clarify in the documentation?

vaithak commented 2 months ago

That is a feature, not a bug because often (in case of ROOT) we were told that is is with accumulating especially for vector values functions…

I don't mind the accumulation, but the accumulation seems to be happening incorrectly in the second case.

vaithak commented 2 months ago

Python frameworks like Pytorch accumulate too, until you call .zero_grad method (or something similar), but the accumulation result seems to be incorrect here.

vgvassilev commented 2 months ago

Ok, sorry I missed that part!

vgvassilev / clad

Incorrect gradient computation for non-zeroed derivatives #888