added BPT for training very long sequence

young-geng / EasyLM

Large language models (LLMs) made easy, EasyLM is a one stop solution for pre-training, finetuning, evaluating and serving LLMs in JAX/Flax.

Apache License 2.0

2.33k stars 247 forks source link

added BPT for training very long sequence #90

Closed forhaoliu closed 11 months ago

forhaoliu commented 11 months ago

Added blockwise parallel transformer for train 32x longer sequence than vanilla transformer and 4x longer than memeff / flashattention. Including blockwise attention, blockwise ffn and blockwise loss.

forhaoliu commented 11 months ago

Tests passed.