mitsuba-renderer / enoki

Enoki: structured vectorization and differentiation on modern processor architectures
Other
1.26k stars 94 forks source link

Null gradient when turning free_graph off #98

Open eliemichel opened 4 years ago

eliemichel commented 4 years ago

The following snippet prints null gradients while if using backward(c, true) we get the right value (5.0, 2.0):

using FloatD = DiffArray<float>;
FloatD a = 2.0f;
FloatD b = 5.0f;
set_requires_gradient(a);
set_requires_gradient(b);

FloatD c = a * b;

backward(c, false);
LOG << "dc/da = " << gradient(a);
LOG << "dc/db = " << gradient(b);

Output:

dc/da = 0
dc/db = 0

Expected:

dc/da = 5.0
dc/db = 2.0

Built with MSVC16, without CUDA, commit e240a4b

edit: The line canceling the gradients is this one: https://github.com/mitsuba-renderer/enoki/blob/master/src/autodiff/autodiff.cpp#L896 I am not sure what this reference counter is, but shouldn't the condition be if (target.ref_count_int == 0) rather than > 0 ?

Speierers commented 4 years ago

Hi @eliemichel ,

I doubt DiffArray<float>. IIRC automatic differentiation in enoki is only supported for CUDAArray. Do you have the same issue when using DiffArray<CUDAArray<float>> instead?

eliemichel commented 4 years ago

Regarding the other issue I don't have precise ideas but for this one what do you think about the suggested fix of changing line 896 of autodiff.cpp? Did I misunderstand the meaning of this ref_count_int?

stefanjp commented 3 years ago

i could reproduce the problem with the "Interfacing with PyTorch" example from the documentation, just modifying FloatD.backward() to FloatD.backward(free_graph=False). I also added imports for FloatC and FloatD as the example did not run out of the box, but i guess that is unrelated. I ended up here after trying to modify the mitsuba autodiff function render_torch to not whipe the AD graph.

import torch
import enoki
from enoki.cuda_autodiff import Float32 as FloatD
from enoki.cuda import Float32 as FloatC
class EnokiAtan2(torch.autograd.Function):
    @staticmethod
    def forward(ctx, arg1, arg2):
        # Convert input parameters to Enoki arrays
        ctx.in1 = FloatD(arg1)
        ctx.in2 = FloatD(arg2)

        # Inform Enoki if PyTorch wants gradients for one/both of them
        enoki.set_requires_gradient(ctx.in1, arg1.requires_grad)
        enoki.set_requires_gradient(ctx.in2, arg2.requires_grad)

        # Perform a differentiable computation in ENoki
        ctx.out = enoki.atan2(ctx.in1, ctx.in2)

        # Convert the result back into a PyTorch array
        out_torch = ctx.out.torch()

        # Optional: release any cached memory from Enoki back to PyTorch
        enoki.cuda_malloc_trim()

        return out_torch

    @staticmethod
    def backward(ctx, grad_out):
        # Attach gradients received from PyTorch to the output
        # variable of the forward pass
        enoki.set_gradient(ctx.out, FloatC(grad_out))

        # Perform a reverse-mode traversal. Note that the static
        # version of the backward() function is being used, see
        # the following subsection for details on this
        FloatD.backward(free_graph=False)

        # Fetch gradients from the input variables and pass them on
        result = (enoki.gradient(ctx.in1).torch()
                  if enoki.requires_gradient(ctx.in1) else None,
                  enoki.gradient(ctx.in2).torch()
                  if enoki.requires_gradient(ctx.in2) else None)

        # Garbage-collect Enoki arrays that are now no longer needed
        del ctx.out, ctx.in1, ctx.in2

        # Optional: release any cached memory from Enoki back to PyTorch
        enoki.cuda_malloc_trim()

        return result

# Create enoki_atan2(y, x) function
enoki_atan2 = EnokiAtan2.apply

# Let's try it!
y = torch.tensor(1.0, device='cuda')
x = torch.tensor(2.0, device='cuda')
y.requires_grad_()
x.requires_grad_()

o = enoki_atan2(y, x)
print(o)

o.backward()
print(y.grad)
print(x.grad)

The modified example prints:

tensor([0.4636], device='cuda:0', grad_fn=) tensor(0., device='cuda:0') tensor(0., device='cuda:0')

Whereas the unmodified example prints:

tensor([0.4636], device='cuda:0', grad_fn=) tensor(0.4000, device='cuda:0') tensor(-0.2000, device='cuda:0')