mlcommons / GaNDLF

A generalizable application framework for segmentation, regression, and classification using PyTorch
https://gandlf.org
Apache License 2.0
158 stars 79 forks source link

Corrected forward operations order #897

Closed hongbozheng closed 2 months ago

hongbozheng commented 2 months ago

Fixes #ISSUE_NUMBER

Proposed Changes

Checklist

github-actions[bot] commented 2 months ago

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

sarthakpati commented 2 months ago

Thank you for your contribution, @hongbozheng!

Since @Geeks-Sid wrote the initial module, I am requesting the review from him.

⚠️ ⚠️ ⚠️

This PR is going to break all backward compatibility with unet and derived classes with trained models. Thus, it needs detailed discussion.

hongbozheng commented 2 months ago

Also, if the operation order is changed to [input --> in --> lrelu --> ConvDS --> output], the initialization of norm layer in __init__

self.in_0 = norm(output_channels, **norm_kwargs)
                 ^^^^^^^^^^^^^^^

should be

self.in_0 = norm(input_channels, **norm_kwargs)
                 ^^^^^^^^^^^^^^

since norm now becomes the first layer.

sarthakpati commented 2 months ago

Ah, I understand. So, there is going to be a breaking of backward compatibility regardless of the direction. Anyway, I will wait for @Geeks-Sid to comment on this.

sarthakpati commented 2 months ago

Ah, I understand. So there is going to be a breaking backward compatibility regardless...

Anyway, @hongbozheng, you might need to take a look at the errors coming from the CI:

Geeks-Sid commented 2 months ago

@hongbozheng , Thank you for the PR. This makes sense. This still needs to pass the tests.

hongbozheng commented 2 months ago

Anyway, @hongbozheng, you might need to take a look at the errors coming from the CI:

I think the errors are caused by the issue I mentioned before, the norm layer initialization needs to be changed from

self.in_0 = norm(output_channels, **norm_kwargs)
                 ^^^^^^^^^^^^^^^

to

self.in_0 = norm(input_channels, **norm_kwargs)
                 ^^^^^^^^^^^^^^

as the error says ValueError: expected input's size at dim=1 to match num_features (64), but got: 32. which is exactly caused by the initialization of norm layer above.

The norm layer was initially the 3rd layer which performs norm after ConvDS (32 -> 64) that produces num_features=64, but it's now moved to the 1st layer which takes the num_features=32 directly from the Input.

I just made the change and it passed all the CI tests.

szmazurek commented 2 months ago

Hey there, jumping in with the few notes from my side - now the implementation indeed looks correct in terms of compatibility with the ones in the comment, also the errors were caused by what @hongbozheng correctly shows - wrong input dimension in the norm layer, as now it is applied at different stage. Overall LGTM.

sarthakpati commented 2 months ago

Awesome, thank you for your comments, @Geeks-Sid and @szmazurek!

@hongbozheng - have you already signed the MLCommons CLA?

hongbozheng commented 2 months ago

@hongbozheng - have you already signed the MLCommons CLA?

I did yesterday, but I got the below email reply.

Thank you for submitting your request to contribute code to MLCommons GitHub repositories with the GitHub ID "hongbozheng." You are not currently listed in the designated employees section of the CLA for University of Illinois. We have reached out to our primary contact for University of Illinois regarding adding you to the CLA and will follow up with you once we hear back.

sarthakpati commented 2 months ago

@hongbozheng - have you already signed the MLCommons CLA?

I did yesterday, but I got the below email reply.

Thank you for submitting your request to contribute code to MLCommons GitHub repositories with the GitHub ID "hongbozheng." You are not currently listed in the designated employees section of the CLA for University of Illinois. We have reached out to our primary contact for University of Illinois regarding adding you to the CLA and will follow up with you once we hear back.

Unless you are a paid employee of UI, you might need to ask them to updated this an "individual contributor"...

hongbozheng commented 2 months ago

@hongbozheng - have you already signed the MLCommons CLA?

I signed the CLA and joined the MLCommons GitHub organization. I'm not sure what to do next to pass the cla-check.

sarthakpati commented 2 months ago

@hongbozheng - have you already signed the MLCommons CLA?

I signed the CLA and joined the MLCommons GitHub organization. I'm not sure what to do next to pass the cla-check.

Thanks! I re-triggered the job and it passed. Welcome to the community! 😄