Closed rabinadk1 closed 1 year ago
This is the test case matrix where the modified code behaves differently compared with the original code. The linear module maps to the whitelist and the rest belongs to the blacklist. This is generally the idea behind it.
Test Case | Module | Parameter | Original Code | Modified Code -- | -- | -- | -- | -- 1 | Linear | weight | decay | decay 1 | Linear | bias | no_decay | no_decay 2 | LayerNorm | weight | no_decay | no_decay 2 | LayerNorm | bias | no_decay | no_decay 3 | Embedding | weight | no_decay | no_decay 4 | Custom | custom_param | N/A | decaymodifying the else should help fix it.
code:
decay = set() no_decay = set() blacklist_weight_modules = (torch.nn.LayerNorm, torch.nn.Embedding) for mn, m in self.named_modules(): for pn, p in m.named_parameters(): fpn = f"{mn}.{pn}" if mn else pn # full param name if pn.endswith("bias") or isinstance(m, blacklist_weight_modules): no_decay.add(fpn) elif pn.endswith("weight"): decay.add(fpn)
modifying the else should help fix it.
code:
decay = set() no_decay = set() blacklist_weight_modules = (torch.nn.LayerNorm, torch.nn.Embedding) for mn, m in self.named_modules(): for pn, p in m.named_parameters(): fpn = f"{mn}.{pn}" if mn else pn # full param name if pn.endswith("bias") or isinstance(m, blacklist_weight_modules): no_decay.add(fpn) elif pn.endswith("weight"): decay.add(fpn)
Thanks, missed the logic there.
First, I thought the code below could be simplified by separating parameters for weight decay.
I found this code to be succinct. But interestingly, this code resulted in duplicate parameters.
Can anyone explain me why the changed code results in duplicate parameters?