Open norikazu99 opened 1 month ago
Hello and thank you for sharing the repo. I'd like to know if muP would work out of the box with Mamba model or I would have to rescale some constants like in transformers attention_scores?
Hello and thank you for sharing the repo. I'd like to know if muP would work out of the box with Mamba model or I would have to rescale some constants like in transformers attention_scores?