Blog article on the new feature enabling packing instruction tuning examples whilst still availing of Flash Attention 2 and the new DataCollatorWithFlattening. Users will find it can provide up to 2x improvement in training throughput while maintaining convergence quality.
@osanseviero FYI
Authors are
@RhuiDih @ArthurZucker @achikundu @wynterl @raghukiran1224 @mayank31398
Blog article on the new feature enabling packing instruction tuning examples whilst still availing of Flash Attention 2 and the new DataCollatorWithFlattening. Users will find it can provide up to 2x improvement in training throughput while maintaining convergence quality.
@osanseviero FYI
Authors are @RhuiDih @ArthurZucker @achikundu @wynterl @raghukiran1224 @mayank31398