Integration with the HuggingFace Ecosystem

huggingface / nanotron

Minimalistic large language model 3D-parallelism training

Apache License 2.0

1.14k stars 107 forks source link

Integration with the HuggingFace Ecosystem #47

Closed woshiyyya closed 8 months ago

woshiyyya commented 8 months ago

This is so great! You guys are addressing a lot of pain points in large model training with simple APIs!

One question: what would be the integration for huggingface ecosystem? For example:

Can we directly load model with from_pretrained method, and automatically convert it to NanotronModel? Or we need to manually define a model class in Nanotron style and write some cutom loading logics?
Can we easily export model checkpoints to HF/safetensors format?

NouamaneTazi commented 8 months ago

For now most transformers architectures should be supported with minimal changes (like adding PipelineBlock for pp or column / row linears for TP). We'll have easy to follow templates for that soon, but for now we can use llama as an example :)

While training we save checkpoints in safetensors format which enables resuming training from those. And can easily be adapted HF's transformers depending on your model's modeling!