Open shiyf129 opened 2 years ago
Hi!
The snippets you included seem reasonable, except that the widths tested seem small, if it's the d_model of a Transformer.
Can you try larger widths and attach the coord check plots?
@shiyf129 I also think the snippets look reasonable. I have done coord checks on Swin as well, and I attach the plots here. Echoing Edward's suggestion, the widths tested is typically 256, 512, 1024, and 2048. Have you tried larger widths and attaching your coord check plots?
Hi, we are trying to use mup tool to tune Swin Transformer v2 model. I modified the code of Swin Transformer v2 to adapt mup and executed the "save base shape" and "coordinate check". The results of "coordinate check" shows that it can not meet the requirements of mup.
Does mup support the Swin Transformer v2 model?
For the code of "swin_transformer_v2.py", I modified the following code (Because Swin Transformer v2 doesn't use "1/sqrt(d) attention scaling", I don't modify it):
For the code of "main.py" of Swin Transformer, I added "save base shape" and "coordinate check" functions.
The results of "coordinate check" show that there is only a small difference between "mup" and "SP". sorry, I can't upload pictures. Could you please help us to check if mup can support Swin Transformer v2 model? or there are some other reasons? Thanks a lot.