jzhang38 / EasyContext

Memory optimization and training recipes to extrapolate language models' context length to 1 million tokens, with minimal hardware.
Apache License 2.0
529 stars 33 forks source link

Modify interface #5

Closed jzhang38 closed 3 months ago

jzhang38 commented 3 months ago
  1. Modify interface (see the Usage section in README)
  2. Rectify a small bug in dist_flash_attn. Now dist_flash_attn and zigzag_ring_attn produce the same loss.
Screenshot 2024-04-07 at 2 07 40 PM
jzhang38 commented 3 months ago
Screenshot 2024-04-07 at 4 09 15 PM

Same loss.