Closed woshiyyya closed 8 months ago
For now most transformers architectures should be supported with minimal changes (like adding PipelineBlock
for pp or column / row linears for TP). We'll have easy to follow templates for that soon, but for now we can use llama as an example :)
While training we save checkpoints in safetensors format which enables resuming training from those. And can easily be adapted HF's transformers depending on your model's modeling!
This is so great! You guys are addressing a lot of pain points in large model training with simple APIs!
One question: what would be the integration for huggingface ecosystem? For example:
from_pretrained
method, and automatically convert it toNanotronModel
? Or we need to manually define a model class in Nanotron style and write some cutom loading logics?