great work! can you do a mistral 1b tinyllama?

jzhang38 / TinyLlama

The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.

Apache License 2.0

7.64k stars 446 forks source link

great work! can you do a mistral 1b tinyllama? #65

Closed hiqsociety closed 11 months ago

hiqsociety commented 11 months ago

great work! can you do a mistral 1b tinyllama? mistral ai is good.

RonanKMcGovern commented 11 months ago

We don't know exactly what makes Mistral better, but likely it was just trained for longer - which is exactly what is being done with tinyllama.

VatsaDev commented 11 months ago

This Issue Doesn't really make sense? Are you talking about mistral architecture, like sliding window? The dataset for mistral is unreleased, and there are no comments on any parts of the dataset, other than the 8T rumor? Without a plan/details, this just comes across as mistral Hype.

PhilippeFerreiraDeSousa commented 11 months ago

He likely means swapping Llama by Mistral ie swapping architecture and tokenizer on same project dataset.

RonanKMcGovern commented 11 months ago

@PhilippeFerreiraDeSousa yeah good point.

I'm not a huge fan of mistral architecture. My sense is the reduced attention is lossy. Other than that it's not all that different than Llama, just a bit faster.

hiqsociety commented 11 months ago

@VatsaDev i just realised i like mistral so much (i use it more than the rest of the 7Bs). wasnt thinking clear. I thought someone can figure out how to shrink mistral to 1B or something.

anyway, I've tested tinyllama 1B and it's "great", hope to see the fixed version. current one has a lot of repetitions.