Packing is currently hard coded into preprocessing and it would better for it to be optional.
The current implementation also breaks examples, which is not desirable for small datasets with specific formatting.
We should re-implement packing so that examples are treating as atomic units. This will introduce variable sequence lengths, but it will ensure explicit respect for input data formats.
Packing is currently hard coded into preprocessing and it would better for it to be optional.
The current implementation also breaks examples, which is not desirable for small datasets with specific formatting.
We should re-implement packing so that examples are treating as atomic units. This will introduce variable sequence lengths, but it will ensure explicit respect for input data formats.