huggingface / nanotron

Minimalistic large language model 3D-parallelism training
Apache License 2.0
1.14k stars 107 forks source link

[Feature] nanotron <-> conversion for Llama #124

Closed yardenas closed 4 months ago

yardenas commented 6 months ago

A similar idea as in https://github.com/huggingface/nanotron/pull/103, but for a Llama model.

I'd be happy to implement this.

I need it for another project that uses nanotron and was wondering if it is something that you'd want in this repository? If so, I'll start working on an implementation here.

Aside from the contribution guide, are there any other guidelines for this task? For example:

Thanks!