Trying to create a UNet with descent, but a node is not connected

Hi, I've been trying to extend descent to be able to create a U-Net (as in https://arxiv.org/abs/1505.04597) with it.

A U-Net ist basically a bunch of convolutional layers that get concatenated with previous versions of their input.

I implemented the following two new operations in array.rs:

Upsampling: https://github.com/apexys/descent/blob/61415476ffaeda734b841e11b533092691c569b4/src/array.rs#L833-L848

pub fn upsample(self, x_grow_factor: usize, y_grow_factor: usize) -> Self{
    let (a, da) = self.into_inner();
    let input_shape = a.shape();
    assert_eq!(input_shape.len(), 4);
    assert_eq!(a.shape(), da.shape());
    let a_reshaped = a.reshape([input_shape[0], input_shape[1], 1, input_shape[2], 1, input_shape[3]]);
    let da_reshaped = da.reshape([input_shape[0], input_shape[1], 1, input_shape[2], 1, input_shape[3]]);
    let a_broadcasted = a_reshaped.broadcast([input_shape[0], input_shape[1], y_grow_factor, input_shape[2], x_grow_factor, input_shape[3]]);
    let da_broadcasted = da_reshaped.broadcast([input_shape[0], input_shape[1], y_grow_factor, input_shape[2], x_grow_factor, input_shape[3]]);
    let mut output_shape = input_shape;
    output_shape[2] *= y_grow_factor;
    output_shape[1] *= x_grow_factor;
    let a_backshaped = a_broadcasted.reshape(output_shape);
    let da_backshaped = da_broadcasted.reshape(output_shape);
    (a_backshaped, da_backshaped).into()
}

Cropping: https://github.com/apexys/descent/blob/61415476ffaeda734b841e11b533092691c569b4/src/array.rs#L850-L874

pub fn crop(self, left: usize, top: usize, right: usize, bottom: usize) -> Self{
    let (a, da) = self.into_inner();

    let input_shape = a.shape();
    assert_eq!(input_shape.len(), 4);

    let mut input_offsets: TinyVec<[isize; MAX_DIM]> = std::iter::repeat(0).take(input_shape.len()).collect();
    input_offsets[1] = top as isize;
    input_offsets[2] = left as isize;

    let mut output_shape = input_shape;
    output_shape[1] -= top + bottom;
    output_shape[2] -= left + right;

    let view = View{
        input_shape: a.shape(),
        input_offsets,
        output_mapping: (0..input_shape.len()).map(|i| input_shape.identity_mapping(Axis::from_index(i))).collect(),
        output_shape
    };

    let b = a.view(view);
    let db = da.view(view);
    (b, db).into()
}

The U-Net is then created as a recursive struct, where each layer just applies two Conv-Operations and if it's not the innermost layer, also a cropped and upsampled version of the inner layer, which is then concatenated to the output and goes through another double convolution.

Here's just the execution part, full code is at https://github.com/apexys/descent_unet_example/blob/b828ce8c4a034d9288f316c108a72229b163f493/src/main.rs

fn eval<'s>(&self, input: DualArray<'s>, ctx: &EvalContext) -> DualArray<'s> {
    let x = input.apply(&self.conv1, ctx);
    let x = x.apply(&self.conv2, ctx);
    let x = if let Some((pool, inner)) = self.inner.as_ref() {
        let [_, h_outer, w_outer, _]: [usize; 4] = x.shape().try_into().unwrap();

        let x_inner = x.apply(pool, ctx);
        let x_inner = inner.eval(x_inner, ctx);
        let [_, h_inner, w_inner, _]: [usize; 4] = x_inner.shape().try_into().unwrap();
        let x_inner = x_inner.upsample(
            w_outer.div_ceil(w_inner),             
            h_outer.div_ceil(h_inner)
        );
        let [_, h_inner, w_inner, _]: [usize; 4] = x_inner.shape().try_into().unwrap();
        assert_eq!(h_inner, w_inner);
        let left = (w_inner - w_outer) / 2;
        let right = (w_inner - w_outer) - left;
        let top = (h_inner - h_outer) / 2;
        let bottom = (h_inner - h_outer) - top;
        let x_inner = x_inner.crop(
            left, top, right, bottom
        );
        x.concat(x_inner, -1)
    } else {
        x
    };
    let x = x.apply(&self.conv3, ctx);
    let x = x.apply(&self.conv4, ctx);
    x
}

When I create the graph as in I run into the following problem:

Somewhere in the creation of the graph, a Mov-Operation is created, but no inputs are connected to it.

I've added logging after every stage of the optimization process, but the error seems to be there from the start (see the upper right Node n295 in https://github.com/apexys/descent_unet_example/blob/main/svgs/after_build_clusters.svg).

I've also tried isolating the problem and it seems that my upsample and crop operations are "fine", at least I couldn't get the error to show up with just them, but they might still cause problems further down the line.

Do you have an idea of what might cause this?

Thank you for your help!

Hi, happy to see that the code is useful for experimentation!

I've not tried debugging the above, but I think the DualArray upsample/crop functions above don't quite follow the flow I'd expect.

Using instances of a DualArray is designed to be simple, but implementing an operation on DualArray is a bit odd:

The value part is straightfoward, compute the result, return it as output
The loss_grad part is odd, it goes in the reverse direction! We want to add lower level Array expressions for how this changes for the input in terms of the output. It directly implements the backpropagation step!

As such I'd expect the implementation of a DualArray operation to have the following pattern:

Create the output value, use with_empty_grad() to add a placeholder for the output grad term
Call accumulate on the input grad with some expression based on this output grad
Combine value and grad into the returned DualArray

There is a bit more of an explanation of this API at https://github.com/sjb3d/descent/tree/main/examples/array_api, and hopefully the existing implementations of e.g. sin or tanh on DualArray are useful as reference!

Ah, that makes more sense than what I did! Thank you for the explanation!

I've rewritten cropping and upsampling.

In cropping, for the reverse path, we need to pad the incoming gradient with zeroes to make it fit with the existing one.

pub fn crop(self, left: usize, top: usize, right: usize, bottom: usize) -> Self{
    let (a, da) = self.into_inner();

    let input_shape = a.shape();
    assert_eq!(input_shape.len(), 4);

    //Compute crop shape
    let mut input_offsets: TinyVec<[isize; MAX_DIM]> = std::iter::repeat(0).take(input_shape.len()).collect();
    input_offsets[1] = top as isize;
    input_offsets[2] = left as isize;

    let mut output_shape = input_shape;
    output_shape[1] -= top + bottom;
    output_shape[2] -= left + right;

    //Crop input to shape
    let view = View{
        input_shape: a.shape(),
        input_offsets,
        output_mapping: (0..input_shape.len()).map(|i| input_shape.identity_mapping(Axis::from_index(i))).collect(),
        output_shape
    };
    let (b, db) = a.view(view).with_empty_grad();

    //We also need to pad the gradient by the crop we just did
    let padded = db.pad(1, top, bottom).pad(2, left, right);
    da.accumulate(padded);

    (b, db).into()
}

In upsampling, we need to shrink the gradient in X and Y by just summing the incoming window of the same size as our upsampling factor. We could do this through a convolution with the channels set exactly, but I found the reduce_sum-function and copied some code from the max_pool2d-function to use that to implement "sum-pooling".

pub fn upsample(self, x_grow_factor: usize, y_grow_factor: usize) -> Self{
    let (a, da) = self.into_inner();
    let input_shape = a.shape();
    assert_eq!(input_shape.len(), 4);
    let a_reshaped = a.reshape([input_shape[0], input_shape[1], 1, input_shape[2], 1, input_shape[3]]);
    let a_broadcasted = a_reshaped.broadcast([input_shape[0], input_shape[1], y_grow_factor, input_shape[2], x_grow_factor, input_shape[3]]);
    let mut output_shape = input_shape;
    output_shape[2] *= x_grow_factor;
    output_shape[1] *= y_grow_factor;
    let a_backshaped = a_broadcasted.reshape(output_shape);
    let (b, db) = a_backshaped.with_empty_grad();
    //We need to add all pixels we upsampled into the pixel they came from
    //We can do this through sum-pooling with stride
    //Following code basically copied from the max-pooling implementation
    let windows = db.image_to_windows((y_grow_factor, x_grow_factor), (y_grow_factor, x_grow_factor), 1);
    let [m, output_h, output_w, groups, filter_h, filter_w, group_nc]: [usize; 7] =
        windows.shape().try_into().unwrap();

    let summed = windows
        .reshape([
            m * output_h * output_w * groups,
            filter_h * filter_w,
            group_nc,
        ])
        .reduce_sum(1, true)
        .reshape([m, output_h, output_w, groups * group_nc]);

    da.accumulate(summed);

    (b, db).into()
}

I think the gradients look better now. The example compiles and I get a training loss. Now I need to build a slightly bigger example to see if it actually learns something or whether I've gotten something wrong in the gradient!

Nice, these updated functions look good to me, glad that the resulting graph compiles to working compute shaders! Good luck for the rest of your project! :)

Thanks so much for your help! If you'd like I can prepare a pull request with the two methods and an example using it for image colorization as in https://github.com/apexys/descent_unet_example/blob/main/examples/colorizing/main.rs

capybara_colorization

There was still one problem with the gradient: I didn't look into whether the existing padding was same padding and that caused problems when the gradient got small.

I wrote my own, probably horribly inefficient zero padding method and it seems to be more stable: https://github.com/apexys/descent/commit/4073bc665ae2805dd743f34d067cbd953883c5be

At the same time, I'm getting these weird spikes when plotting loss over training generation, so maybe something is still broken? grafik

Hi, the existing padding code is implementing same padding. This is really just for simplicity, since it means just clamping the input indices when reading values from arrays).

For same padding the gradient of each input value will accumulate gradients for all the outputs that is gets broadcast to (since same padding causes edge values to get broadcast to multiple outputs). There are a few tests (search for "#[test]") that check values and gradients for simple examples.

That loss graph looks a bit odd for sure! This stuff is unfortunately quite tricky to debug, personally I found just testing each Array op in isolation with known results was enough to get the networks I tried working, but certainly there still could be some bugs in there!

sjb3d / descent

Trying to create a UNet with descent, but a node is not connected #2