huggingface / alignment-handbook

Robust recipes to align language models with human and AI preferences
https://huggingface.co/HuggingFaceH4
Apache License 2.0
4.28k stars 367 forks source link

SFT training doesn't fully go through all samples #61

Open hanxiaotian opened 7 months ago

hanxiaotian commented 7 months ago

Current training uses ConstantLengthDataset. This dataset return fixed length of tokens (2048) in every step, however, the total number of steps are calculated based on the number of samples. I checked some samples and found that quite a few of them are much longer than 2048 (~7000), this means that some of the samples have never been seen in one epoch of training.

Could you please verify if my understanding is correct?

Thanks, appreciate.

lewtun commented 7 months ago

Hello @hanxiaotian yes there is a small bug in TRL's SFTTrainer with how the training steps are counted and is being fixed here: https://github.com/huggingface/trl/pull/979

hanxiaotian commented 7 months ago

Another quick question, after concatenate tokens from different samples seperated by "eos" token, the loss are calculated over the whole sequence without any mask, does my understanding correct? Thanks!

Randl commented 7 months ago

So the fix is merged, but there is no release yet, and when there will be, the requirements should be update to new version of TRL