mosaicml / examples

Fast and flexible reference benchmarks
Apache License 2.0
441 stars 125 forks source link

Fix slicing for padding + cache #257

Closed dakinggg closed 1 year ago

dakinggg commented 1 year ago

Fixes a bug in the position id adjustment when using kv caching and padding together. Previously, the slicing happened to the attention_mask == 0 part, but should have happened outside of the cumsum part. This only had an effect when doing both kv caching and padding together, so definitely should not affect training, only generation.