Closed elliotttruestate closed 9 months ago
If I add a print output to the error in data_loader.py, to print(batches) at line 604, I get the following output. Should this parameter 'return_loss': True be here, and should the function be concatenating across dictionaries like this?
[{'input_ids_chosen': tensor([[ 101, 4243, 131, ..., 0, 0, 0],
[ 101, 4243, 131, ..., 0, 0, 0],
[ 101, 4243, 131, ..., 0, 0, 0],
...,
[ 101, 4243, 131, ..., 0, 0, 0],
[ 101, 4243, 131, ..., 0, 0, 0],
[ 101, 4243, 131, ..., 0, 0, 0]]), 'attention_mask_chosen': tensor([[1, 1, 1, ..., 0, 0, 0],
[1, 1, 1, ..., 0, 0, 0],
[1, 1, 1, ..., 0, 0, 0],
...,
[1, 1, 1, ..., 0, 0, 0],
[1, 1, 1, ..., 0, 0, 0],
[1, 1, 1, ..., 0, 0, 0]]), 'input_ids_rejected': tensor([[ 101, 4243, 131, ..., 0, 0, 0],
[ 101, 4243, 131, ..., 0, 0, 0],
[ 101, 4243, 131, ..., 0, 0, 0],
...,
[ 101, 4243, 131, ..., 0, 0, 0],
[ 101, 4243, 131, ..., 0, 0, 0],
[ 101, 4243, 131, ..., 0, 0, 0]]), 'attention_mask_rejected': tensor([[1, 1, 1, ..., 0, 0, 0],
[1, 1, 1, ..., 0, 0, 0],
[1, 1, 1, ..., 0, 0, 0],
...,
[1, 1, 1, ..., 0, 0, 0],
[1, 1, 1, ..., 0, 0, 0],
[1, 1, 1, ..., 0, 0, 0]]), 'return_loss': True}]
Commenting out the return_loss key on the batch dictionary in RewardDataCollatorWithPadding (trainer/utils.py line 256) resolves this error. However, I have no idea what this parameter does, so I don't know if this should be a pull request. If someone more involved understands how this function is meant to work please let me know. I'll keep this issue open for the meantime in case anyone else comes across it.
Thank you.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
May be the same issue as: TypeError: Can only concatenate tensors but got <class 'bool'>
when using dpo_trainer. The map to token_id do not remove original columns!
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
I think this issue should be reopened, because Reward Modelling or DPO do not work with IterableDataset. This error still persists.
Hi,
I am trying to apply reward modelling to an IterableDataset. I am having an issue with a strange failure mode that I am struggling to debug. I can replicate the same stack trace in the reward_modeling.py example below, by making the following changes:
load_dataset -> change streaming=True train_dataset.map -> remove num_proc because this is invalid for an IterableDataset ScriptArguments.reward_config -> add max_steps=10000 for an IterableDataset.
The code gives the error 'TypeError: Can only concatenate tensors but got <class 'bool'>'. I have attached the code and the stack trace, however it is difficult to debug inside the trainer.train() function. I thought it might have to do with the lazy evaluation creating new columns, and this confusing the trainer, but even after forcing the new columns to be created at initialization, the training gives this same error.
If anyone has any ideas on why this would be please let me know! Thank you!