instadeepai / nucleotide-transformer

🧬 Nucleotide Transformer: Building and Evaluating Robust Foundation Models for Human Genomics
https://www.biorxiv.org/content/10.1101/2023.01.11.523679v2
Other
481 stars 55 forks source link

Confusion about the max_positions value #62

Closed Cheerful0 closed 6 months ago

Cheerful0 commented 7 months ago

Hello, In the module of get_pretrained_model, you set the 'max_positions=32', I cannot understand the meaning of this value equal to 32, could you please tell me why set the value in it? Thanks

dallatt commented 6 months ago

Hello @Cheerful0,

The max_positions is here to deal with padding and as a protection. The tokenizer returned by get_pretrained_model will pad the sequences to this max_positions and raise an error if a sequence longer than this is passed to it. That way, you are sure to provide the model with sequences of appropriate length and you don't have to deal yourself with batching the inputs!

Hope this helps, Hugo