erizmr commented 2 years ago

In this issue, we would like to share a draft implementation plan for the forward mode autodiff.

Background

In general, there are two modes for autodiff: reverse mode and forward mode. The two modes have their advantage in different scenarios. The reverse mode is more efficient when the number of inputs is much more than the outputs (e.g., machine learning cases, thousands of trainable parameters and one scalar loss). On the contrary, the forward mode is more efficient. In addtion, the second-order derivatives can be efficiently computed by combining both the forward and reverse mode.

For a roadmap for the autodiff feature in Taichi, please check out #5050.

Goals

Implement forward mode autodiff.
Design python interface for forward and reverse mode.
Make it possible to apply both forward/reverse mode iteratively (e.g., forward(reverse())), preparing for computing second-order derivatives.

Implementation Roadmap

Implement forward mode autodiff.
- [x] Allocate dual snodes for fields #5083
- [x] Implement forward mode routine #5098
- [x] Distinguish three kinds of kernels: primal, forward ad and reverse ad #5098
- [x] Implement forward ad transformation for operators currently supported by reverse mode #5160
- [x] Handle mutation (e.g., for-loops) #5225
- [x] Handle control flow #5231
- [x] Unify the shared components for reverse and forward mode #5088
- [ ] Handle the stop gradients for dual snodes
- [x] Handle the lazy gradient for dual snodes (allocate dual by forward mode context manager) #5146 #5224
- [x] Support all Unary operators #5366
- [x] Support all Binary operators #5389
- [x] Support all Ternary operators #5405
Design python interface for forward and reverse mode. ~~- [x] Decouple the adjoint and grad, make the grad including both adjoint and dual~~ #5083
- [x] A context manager to trigger forward mode #5146
- [x] Keep the grad indicate adjoint, expose dual for forward mode #5224
Python test cases
- [x] Test cases for all the operators #5366 #5389 #5405
- [x] Test cases for context manager #5224
- [ ] Test cases for for-loop #5592
- [x] Test cases for if #5301
- [x] Test cases for atomic #5286
Second-order derivative
- [x] Proof of concept for forward on reverse second order derivative #5117

Discussions

How many kernels we need to compile for forward mode autodiff?

Currently in reverse mode, two kernels (original kernel and grad kernel) for evaluating function values and compute the gradients respectively. However, in forward mode autodiff, the derivatives are computed eagerly during the function evaluating process, i.e., the functions values and gradients can be computed using only kernel. This raise the question whether need to compile one or two kernels.`

Update: three kinds of kernels are generated: primal, forward ad and reverse ad according to different autodiff modes, see #5098.

victoriacity commented 2 years ago

I wonder if explicitly differentiating a function as in JAX will be supported, for example,

@ti.func
def f(x): return x**3 + 2*x**2 - 3*x + 1

dfdx = forward(f)

@ti.kernel
def k() -> float:
    return dfdx(1.0)
k() # returns 4.0

erizmr commented 2 years ago

I think it is possible to support similar features. A naive current Taichi equivalent is:

import taichi as ti

ti.init()

x = ti.field(float, shape=(), needs_grad=True)
y = ti.field(float, shape=(), needs_grad=True)

@ti.kernel
def f(): 
    y[None] += x[None]**3 + 2*x[None]**2 - 3*x[None] + 1

def dfdx(_x):
    x[None] = _x
    y.grad[None] = 1.0
    f.grad()
    return x.grad[None]

print(dfdx(1.0))

For more general case, it may require to specify the input and output if we would like to generate dfdx for the users. A possible implementation might be:

import taichi as ti

ti.init()

x1 = ti.field(float, shape=(), needs_grad=True)
x2 = ti.field(float, shape=(), needs_grad=True)
x3 = ti.field(float, shape=(), needs_grad=True)
y = ti.field(float, shape=(), needs_grad=True)

@ti.kernel
def f(): 
    y[None] += x1[None]**3 + 2*x2[None]**2 - 3*x3[None] + 1

def backward(f, input_field, out_field):
    import numpy as np
    out_field.grad[None] = 1.0
    def _dfdx(inputs):
        for i, x in enumerate(inputs):
            input_field[i].from_numpy(np.array(inputs[i]))
        f.grad()
        ret = []
        for x in input_field:
            ret.append(x.grad.to_numpy())
        return ret
    return _dfdx

dfdx = backward(f, [x1, x2, x3], y)

print(dfdx([1.0, 2.0, 3.0])) # [3, 8, -3]

taichi-dev / taichi

[RFC] Add forward mode for autodiff #5055

Background

Goals

Implementation Roadmap

Discussions