Closed buaahsh closed 1 year ago
Attention weight was originally designed for alignment models, so it is not necessary to include it in torchscale.
Attention weight was originally designed for alignment models, so it is not necessary to include it in torchscale.