huggingface / nanotron

Minimalistic large language model 3D-parallelism training
Apache License 2.0
1.14k stars 107 forks source link

Where is the "nanotron format" defined? #176

Closed RonanKMcGovern closed 4 months ago

RonanKMcGovern commented 4 months ago

I see that any(?) hf model can be converted to nanotron format with this script.

Is there documentation describing this format?

Can any model that may be loaded with AutoModelForCausalLM be converted to nanotron format for training?

yardenas commented 4 months ago

@RonanKMcGovern, not really tbh, it's tailored for llama2. We're (@AleHD, @TJ-Solergibert) working on an extension to llama3, but generally speaking it wouldn't work for any hugging face model.

RonanKMcGovern commented 4 months ago

Noted, thanks. I guess this is a powerful library to quickly test the performance of different datasets.