The README mentions this codebase can act as a "reference for enthusiasts keen on pretraining language models under 5 billion parameters". I'm wondering if you could give a brief guide on how to do so, assuming we start from a transformers config and tokenizer. Something like:
The README mentions this codebase can act as a "reference for enthusiasts keen on pretraining language models under 5 billion parameters". I'm wondering if you could give a brief guide on how to do so, assuming we start from a transformers config and tokenizer. Something like:
Is a lot of work required to change the codebase to support this?