huggingface / nanotron

Minimalistic large language model 3D-parallelism training
Apache License 2.0
1.14k stars 107 forks source link

Integration with the HuggingFace Ecosystem #47

Closed woshiyyya closed 8 months ago

woshiyyya commented 8 months ago

This is so great! You guys are addressing a lot of pain points in large model training with simple APIs!

One question: what would be the integration for huggingface ecosystem? For example:

NouamaneTazi commented 8 months ago

For now most transformers architectures should be supported with minimal changes (like adding PipelineBlock for pp or column / row linears for TP). We'll have easy to follow templates for that soon, but for now we can use llama as an example :)

While training we save checkpoints in safetensors format which enables resuming training from those. And can easily be adapted HF's transformers depending on your model's modeling!