Open shossain opened 7 months ago
I am looking at the training command for mistral: https://github.com/jquesnelle/yarn/blob/0ae3b2d73d47d28e8dd01142d62845643fe7c575/train.sh#L60
Can I train a 64k context length model with 16k long dataset? Or is it just an example?
I am looking at the training command for mistral: https://github.com/jquesnelle/yarn/blob/0ae3b2d73d47d28e8dd01142d62845643fe7c575/train.sh#L60
Can I train a 64k context length model with 16k long dataset? Or is it just an example?