replicate / cog-llama-template

LLaMA Cog template
Apache License 2.0
307 stars 52 forks source link

Improve data packing support #17

Closed joehoover closed 1 year ago

joehoover commented 1 year ago

Packing is currently hard coded into preprocessing and it would better for it to be optional.

The current implementation also breaks examples, which is not desirable for small datasets with specific formatting.

We should re-implement packing so that examples are treating as atomic units. This will introduce variable sequence lengths, but it will ensure explicit respect for input data formats.

joehoover commented 1 year ago

Solved in https://github.com/replicate/cog-llama-template/pull/18