voldemortX / pytorch-auto-drive

PytorchAutoDrive: Segmentation models (ERFNet, ENet, DeepLab, FCN...) and Lane detection models (SCNN, RESA, LSTR, LaneATT, BézierLaneNet...) based on PyTorch with fast training, visualization, benchmarking & deployment help
BSD 3-Clause "New" or "Revised" License
852 stars 138 forks source link

[Possible BC-Break] turn RESA and SCNN to non-inplace style, #121 #139

Closed PannenetsF closed 1 year ago

PannenetsF commented 1 year ago

test_sc.py.txt Here is also a simple test bench for the behavior.

I've tested the forward with a CULane evaluation without any performance change, but I do not have access to PC supporting torch 1.6. So could you help test the backward with the script?

Also, I seldom use the torch.jit to export onnx, could you help with the is_tracing logic in the PR?

Thanks!

voldemortX commented 1 year ago

@PannenetsF Thanks! But the code look a bit long with SCNN, could it be just add_ -> add like RESA? I'll test things later, maybe tomorrow.

PannenetsF commented 1 year ago

The problem of SCNN is that it sets a slice of output tensor at once, which means each operation should be independent.

On Feb 22, 2023, at 22:44, Zhengyang Feng @.***> wrote:

@PannenetsF https://github.com/PannenetsF Thanks! But the code look a bit long with SCNN, could it be just add_ -> add like RESA? I'll test things later, maybe tomorrow.

— Reply to this email directly, view it on GitHub https://github.com/voldemortX/pytorch-auto-drive/pull/139#issuecomment-1440156256, or unsubscribe https://github.com/notifications/unsubscribe-auth/AK556VAKXH7WAXVOY5US7NTWYYQ3VANCNFSM6AAAAAAVD6IZFY. You are receiving this because you were mentioned.

voldemortX commented 1 year ago

@PannenetsF I tested on CPU and your impl aligns with my old codes both in forward and grads. I've also tested that the old traceable codes have the same effects as your SCNN impl:

for i in range(1, output.shape[2]):
    output[:, :, i:i + 1, :] = output[:, :, i:i + 1, :].add(F.relu(self.conv_d(output[:, :, i - 1:i, :])))
# Up
for i in range(output.shape[2] - 2, 0, -1):
    output[:, :, i:i + 1, :] = output[:, :, i:i + 1, :].add(
        F.relu(self.conv_u(output[:, :, i + 1:i + 2, :])))
# Right
for i in range(1, output.shape[3]):
    output[:, :, :, i:i + 1] = output[:, :, :, i:i + 1].add(F.relu(self.conv_r(output[:, :, :, i - 1:i])))
# Left
for i in range(output.shape[3] - 2, 0, -1):
    output[:, :, :, i:i + 1] = output[:, :, :, i:i + 1].add(
        F.relu(self.conv_l(output[:, :, :, i + 1:i + 2])))

This could save a lot of codes, what do you think? Do they have performance gaps? The time complexity seems to be the same.

PannenetsF commented 1 year ago

I think it might be a inplace version, and I will test it in pytorch 1.10.

---Original--- From: "Zhengyang @.> Date: Thu, Feb 23, 2023 12:47 PM To: @.>; Cc: "Yunqian @.**@.>; Subject: Re: [voldemortX/pytorch-auto-drive] [Refactor] turn RESA and SCNN tonon-inplace style, #121 (PR #139)

@PannenetsF I tested on CPU and your impl aligns with my old codes both in forward and grads. I've also tested that the old traceable codes have the same effects as your SCNN impl: for i in range(1, output.shape[2]): output[:, :, i:i + 1, :] = output[:, :, i:i + 1, :].add(F.relu(self.conv_d(output[:, :, i - 1:i, :]))) # Up for i in range(output.shape[2] - 2, 0, -1): output[:, :, i:i + 1, :] = output[:, :, i:i + 1, :].add( F.relu(self.conv_u(output[:, :, i + 1:i + 2, :]))) # Right for i in range(1, output.shape[3]): output[:, :, :, i:i + 1] = output[:, :, :, i:i + 1].add(F.relu(self.conv_r(output[:, :, :, i - 1:i]))) # Left for i in range(output.shape[3] - 2, 0, -1): output[:, :, :, i:i + 1] = output[:, :, :, i:i + 1].add( F.relu(self.conv_l(output[:, :, :, i + 1:i + 2])))
This could save a lot of codes, what do you think?

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

voldemortX commented 1 year ago

Reference unit test codes: test_20230223.zip

voldemortX commented 1 year ago

I think it might be a inplace version, and I will test it in pytorch 1.10. ---Original--- From: "Zhengyang @.> Date: Thu, Feb 23, 2023 12:47 PM To: @.>; Cc: "Yunqian @.**@.>; Subject: Re: [voldemortX/pytorch-auto-drive] [Refactor] turn RESA and SCNN tonon-inplace style, #121 (PR #139) @PannenetsF I tested on CPU and your impl aligns with my old codes both in forward and grads. I've also tested that the old traceable codes have the same effects as your SCNN impl: for i in range(1, output.shape[2]): output[:, :, i:i + 1, :] = output[:, :, i:i + 1, :].add(F.relu(self.conv_d(output[:, :, i - 1:i, :]))) # Up for i in range(output.shape[2] - 2, 0, -1): output[:, :, i:i + 1, :] = output[:, :, i:i + 1, :].add( F.relu(self.conv_u(output[:, :, i + 1:i + 2, :]))) # Right for i in range(1, output.shape[3]): output[:, :, :, i:i + 1] = output[:, :, :, i:i + 1].add(F.relu(self.conv_r(output[:, :, :, i - 1:i]))) # Left for i in range(output.shape[3] - 2, 0, -1): output[:, :, :, i:i + 1] = output[:, :, :, i:i + 1].add( F.relu(self.conv_l(output[:, :, :, i + 1:i + 2]))) This could save a lot of codes, what do you think? — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

Thanks! Let me know the test results when you finish.

voldemortX commented 1 year ago

Just to be sure, the problem is high pytorch versions raise errors when using inplace version in training, right?

PannenetsF commented 1 year ago

Yes, the inplace operation could cut the bp of train.

---Original--- From: "Zhengyang @.> Date: Thu, Feb 23, 2023 12:56 PM To: @.>; Cc: "Yunqian @.**@.>; Subject: Re: [voldemortX/pytorch-auto-drive] [Refactor] turn RESA and SCNN tonon-inplace style, #121 (PR #139)

Just to be sure, the problem is high pytorch versions raise errors when using inplace version in training, right?

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

PannenetsF commented 1 year ago

Interesting, the test bench passed. I think I might get some typo

Just to be sure, the problem is high pytorch versions raise errors when using inplace version in training, right?

try:
    ori_conv = SpatialConvOriginal()

    x = torch.rand(2, 128, 10, 10)
    inconv = torch.nn.Conv2d(128 ,128, 3)
    y = inconv(x)
    y = torch.nn.functional.relu(y)
    y = ori_conv(y)
    y.sum().backward()
    print('original scnn is good')
except Exception as e:
    print('original scnn is not good')
    print(e)

ori_conv = SpatialConv()
x = torch.rand(2, 128, 10, 10)
inconv = torch.nn.Conv2d(128 ,128, 3)
y = inconv(x)
y = torch.nn.functional.relu(y)
y = ori_conv(y)
y.sum().backward()
print('new scnn is good')

And it turns to be

original scnn is not good
one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [2, 128, 8, 8]], which is output 0 of ReluBackward0, is at version 26; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
new scnn is good

You could test it on your env, thanks!

voldemortX commented 1 year ago

I also find the same test results. Interesting that the old SCNN code does not fail when used alone (without prev layers). I'll merge this, and after I resolve the timm version issue, this repo will fully support recent pytorch versions.

Possible BC-Break: Since SCNN is rather slow to begin with, the FPS tests will remain unchanged.

@PannenetsF Thanks a lot!

PannenetsF commented 1 year ago

Thank you and the pad repo, too! Looking forward to further co-operation!