kuleshov-group / caduceus

Bi-Directional Equivariant Long-Range DNA Sequence Modeling
Apache License 2.0
148 stars 23 forks source link

The size of the vocab is inconsistent with the output dimension of the lm_head #17

Closed jaminzzz closed 5 months ago

jaminzzz commented 5 months ago

Hello, awesome work!

I'm confused as to why the vocabulary size (12) and the output dimension of lm head (16) are inconsistent?

Looking forward to your reply!

2024-04-12 22 42 12
yair-schiff commented 5 months ago

This is because the Mamba/Caduceus config has a parameter named pad_vocab_size_multiple, which pads the vocab to a multiple of some number. We set this value to 8, hence the model embeddding / LM head output get expanded from 12 to 16.