jzhang38 / EasyContext

Memory optimization and training recipes to extrapolate language models' context length to 1 million tokens, with minimal hardware.
Apache License 2.0
529 stars 33 forks source link

Danube2 and Unsloth offloaded gradient ck #15

Closed jzhang38 closed 2 months ago