huggingface / nanotron

Minimalistic large language model 3D-parallelism training
Apache License 2.0
1.23k stars 122 forks source link

Some fixes for `Refactors and fixes #25` #28

Closed 3outeille closed 10 months ago

3outeille commented 10 months ago

Refactors and fixes #25

Fix some bug where we need to enlarge kv cache after enlarging rotary embeding frequency table so that flash_attn_with_kvcache don't overwrite position_idx

3outeille commented 10 months ago

xDD