jquesnelle / yarn

YaRN: Efficient Context Window Extension of Large Language Models
MIT License
1.25k stars 110 forks source link

context length and dataset size #38

Open shossain opened 7 months ago

shossain commented 7 months ago

I am looking at the training command for mistral: https://github.com/jquesnelle/yarn/blob/0ae3b2d73d47d28e8dd01142d62845643fe7c575/train.sh#L60

Can I train a 64k context length model with 16k long dataset? Or is it just an example?