voldemortX / pytorch-auto-drive

PytorchAutoDrive: Segmentation models (ERFNet, ENet, DeepLab, FCN...) and Lane detection models (SCNN, RESA, LSTR, LaneATT, BézierLaneNet...) based on PyTorch with fast training, visualization, benchmarking & deployment help
BSD 3-Clause "New" or "Revised" License
837 stars 137 forks source link

Deformable #94

Open JadenLy opened 2 years ago

JadenLy commented 2 years ago

Hello,

I am doing a school project and came across your paper on BezierLaneNet. I really like the idea and I am trying to implement it. One thing I do notice is that you opted to use mmcv for deformable convolution layer. I wonder if you think it is possible to replace it with the deform_conv2d layer provided by PyTorch as here. It looks like deform_conv2d should provide the same functionality based on the paper it cites. Feel free to give me some suggestion if you have. Due to my computation resources, I am only doing it on CPU so I hope to base the implementation in PyTorch if possible.

Thanks!

voldemortX commented 2 years ago

@JadenLy This replacement is in my plans as well. But it will need some accuracy alignments. And I will need to know what impact it will have on TorchVision version requirement, and mixed precision training. It may not happen very soon. Do you have some preliminary results of replacing it that you can share?

JadenLy commented 2 years ago

@voldemortX Thanks for your response. Since it is a class project for me, I can try and figure out the implementation then as long as the training process looks okay (e.g. loss decreasing steadily), I can call it done. But I do not have enough computational power to fully verify the impact. I can provide you with my implementation once I have it and I would appreciate if you can provide some suggestion as to my implementation.

voldemortX commented 2 years ago

@JadenLy That sounds great! Feel free to make a pull request once you have the implementation.

JadenLy commented 2 years ago

Hi @voldemortX Just a question about the dimension as I am having difficulty trying to make my implementation work. My current implementation on FFF module (designed to have the exact same functionality as your module) is as follows:

class FeatureFlipFusion(nn.Module):
    def __init__(self, channels, kernel_size=(3, 3), groups=1, deform_groups=1) -> None:
        super().__init__()
        self.channels = channels
        self.kernel_size = kernel_size
        self.conv = Conv2DBlock(channels, channels, relu=False, kernel_size=1, padding=0)
        self.norm = nn.BatchNorm2d(channels)
        self.conv_offset = nn.Conv2d(
            channels * 2,
            deform_groups * 3 * self.kernel_size[0] * self.kernel_size[1],
            kernel_size=self.kernel_size,
            padding=1,
            bias=True)

        self.weight = nn.Parameter(torch.Tensor(channels, channels // groups, *kernel_size))
        self.bias = nn.Parameter(torch.Tensor(channels))

        self.init_weights()

    def init_weights(self):
        self.conv_offset.weight.data.zero_()
        self.conv_offset.bias.data.zero_()
        n = self.channels
        for k in self.kernel_size:
            n *= k
        stdv = 1. / math.sqrt(n)
        self.weight.data.uniform_(-stdv, stdv)
        self.bias.data.zero_()

    def forward(self, x):
        flip = x.flip(-1) # 256 * 23 * 40

        x = self.conv(x) # 256 * 23 * 40

        # deformable
        concat = torch.cat([flip, x], dim=1) # 512 * 23 * 40
        out = self.conv_offset(concat) # 27 * 23 * 40
        o1, o2, mask = torch.chunk(out, 3, dim=1)
        offset = torch.cat((o1, o2), dim=1) # 18 * 23 * 40
        mask = torch.sigmoid(mask) # 9 * 23 * 40
        flip = deform_conv2d(flip, offset, self.weight, self.bias, mask=mask) 

        return F.relu(self.norm(flip) + x)

I try to recover your input from mmcv code for padding, stride etc. I also include the dimension of the variable after the function call inline above excluding the batch size. Conv2DBlock is a class to run conv and bn, which I believe should be correct. The problem is that it looks like deform_conv2d needs the offset to be 21 * 38, but the self.conv_offset function I am using does not change the dimension here. So I wonder if you have any suggestion on the implementation here. Let me know. Thanks!

voldemortX commented 2 years ago

@JadenLy Great work! I''ll look into it soon.

voldemortX commented 2 years ago

I try to recover your input from mmcv code for padding, stride etc. I also include the dimension of the variable after the function call inline above excluding the batch size. Conv2DBlock is a class to run conv and bn, which I believe should be correct. The problem is that it looks like deform_conv2d needs the offset to be 21 * 38, but the self.conv_offset function I am using does not change the dimension here. So I wonder if you have any suggestion on the implementation here. Let me know. Thanks!

@JadenLy I think you should add padding=(1,1) in flip = deform_conv2d(xxx, padding=(1,1))

JadenLy commented 2 years ago

Thanks, I think your solution resolves the error!

Another question I have is that, for the curves predicted as in https://github.com/voldemortX/pytorch-auto-drive/blob/master/utils/models/lane_detection/bezier_lane_net.py#L68 and used in https://github.com/voldemortX/pytorch-auto-drive/blob/master/utils/losses/hungarian_bezier_loss.py#L49. In my experiment, I found that the curves returned from the model have dim of [8, 22, 2], where the curves used for loss needs the dim to be [44, 4, 2]. After some debugging, I managed to make it work by modifying the reshape part from model output to be

curves.permute(0, 2, 1).reshape(curves.shape[0], -1, curves.shape[-1] // 2, 2).contiguous()

I wonder if you have encountered such error or if there is a mismatch in the package version that I am using (latest torch) that causes the error. Thanks!

Also I wonder if you have the training loss after 400 epochs for TuSimple dataset for BezierLaneNet, just a rough number would be pretty helpful!

voldemortX commented 2 years ago

Another question I have is that, for the curves predicted as in https://github.com/voldemortX/pytorch-auto-drive/blob/master/utils/models/lane_detection/bezier_lane_net.py#L68 and used in https://github.com/voldemortX/pytorch-auto-drive/blob/master/utils/losses/hungarian_bezier_loss.py#L49. In my experiment, I found that the curves returned from the model have dim of [8, 22, 2], where the curves used for loss needs the dim to be [44, 4, 2]. After some debugging, I managed to make it work by modifying the reshape part from model output to be

curves.permute(0, 2, 1).reshape(curves.shape[0], -1, curves.shape[-1] // 2, 2).contiguous()

I use torch1.6 by default in training, and torch1.8 for onnx conversion tests. I have not experienced this issue, it seems your permute happened in-place or something. Maybe check the torch release notes for this? I'll mark this as a possible bug.

Also I wonder if you have the training loss after 400 epochs for TuSimple dataset for BezierLaneNet, just a rough number would be pretty helpful!

FYI, the whole training loss is around 0.025, and the curve loss is around 0.0075. The tensorboard logs corresponding to the best resnet18 & resnet34 models are here: bezier_loss.zip

JadenLy commented 2 years ago

Hi @voldemortX , I was able to fully train the model with the deform_conv2d module from PyTorch. I implemented the model and some other functions while use some of your code. After 200 epochs with the default parameters you have, I got an accuracy of 88.67, while FPR 0.31 and FNR 0.21 are both relatively high. I examined some images and found samples with bad curve fitting or extra lane. So I would suggest performing the training on your end to see if my result is simply from the change or there is something I implemented wrong. Thanks

voldemortX commented 2 years ago

@JadenLy Can you open a pull request with your dcn implementation, and I will test if it aligns with the mmcv one?