Closed jaminzzz closed 5 months ago
This is because the Mamba/Caduceus config has a parameter named pad_vocab_size_multiple
, which pads the vocab to a multiple of some number. We set this value to 8, hence the model embeddding / LM head output get expanded from 12 to 16.
Hello, awesome work!
I'm confused as to why the vocabulary size (12) and the output dimension of lm head (16) are inconsistent?
Looking forward to your reply!