Open iamanigeeit opened 3 months ago
@iamanigeeit
In general, if all minimally removal structures are zero-invariant, then the output deviations should be nearly as 0, as you might have seen in other DNN sanity checks. (Reference Section 3 in OTOv3 manuscript).
For convnexttiny
, we provided two cases of sanity checks, one for pretraining kept, another one for vanilla network. The difference is disabling one singleton nn.Parameter, i.e., gamma and avoiding a few node groups from pruning. In my memory, both versions should yield the nearly closed to zero deviation. But when I double checked, the pretraining is fine, yet the vanilla one yield similar amount deviation as yours. It is indeed a bit abnormal.
My gut feeling is due to the vanilla network considers the consideration of pruning over grouped conv
, as I mentioned in another issue. It may have some places broken in some previous commit to such large difference. :(
In addition, for transformer, all of MLP layers that I met are zero-invariant. Attention layers are case-by-case, e.g., Bert's is zero-invariant, LLAMA is not, TNLG is Phi-2 is, etc.
@tianyic
I should only care about the maximum output difference, right? It's around 1e-6 or less.
For the pretrained model, i still get variation in flop / param reduction.
Model | FLOP | Param |
---|---|---|
ConvNextTiny | 15-30% | 30-55% |
ResNet18 | 55-80% | 65-90% |
If it's normal, i can close the issue.
Hello @tianyic,
I was running the sanity_check tests on test_convnexttiny.py and got different results despite using a fixed dummy_input.
Test Run 1
Test Run 2
Test Run 3
The difference is quite big so i want to ask if it's normal.