huggingface / nanotron

Minimalistic large language model 3D-parallelism training
Apache License 2.0
1.23k stars 122 forks source link

Refacto generate #199

Closed 3outeille closed 4 months ago

3outeille commented 5 months ago

It doesn't support micro-batches yet but I don't think it is necessary to have it for our use case (which is just sanity checking). Plus this make the code easier to understand

TODO: clean rotary for inference

3outeille commented 4 months ago

Closing for a new one #202