[Feature] nanotron <-> conversion for Llama

A similar idea as in https://github.com/huggingface/nanotron/pull/103, but for a Llama model.

I'd be happy to implement this.

I need it for another project that uses nanotron and was wondering if it is something that you'd want in this repository? If so, I'll start working on an implementation here.

Aside from the contribution guide, are there any other guidelines for this task? For example:

Where should the conversion script be located?
Any gotcha's I should be aware of?
The best way to validate this would be to write a test that shows that the converted models return the same results as the non-converted ones. Do you think that a rather small model (that I can quickly iterate on while running locally) would be sufficient?

Thanks!

huggingface / nanotron

[Feature] nanotron <-> conversion for Llama #124