Open fzyzcjy opened 1 month ago
@fzyzcjy It's entirely possible it's the padding of the large batches that's slowly things down - one trick is to set group_by_length = True
to reduce padding
@danielhanchen I see, thank you!
Btw, I wonder whether group_by_length
will be harmful for model performance, e.g. because it lets the gradient of each step be computed by samples of similar lengths.
Yes it will hurt the training process sadly
I see. Thank you for explanations!
(Btw I am happy to PR to implement https://github.com/unslothai/unsloth/issues/1021, feel free to ping me if needed)
Hi thanks for the library! When using Unsloth to SFT a llama3.2-1B on 4090D, I find something interesting: changing batch size from 1 to 4 does not speed up training.
Three configuration experiments:
Speed: 100 step (i.e. 100x16 samples) takes 70s, 70s, 75s.
It seems common that batch size as small as 1 will lead to slow throughput. Thus I create this issue, in case it is related to something in Unsloth that can be further optimized, i.e. make the batch_size=2 or 4 be faster than the 1 case.