huggingface / trl

Train transformer language models with reinforcement learning.
http://hf.co/docs/trl
Apache License 2.0
10k stars 1.26k forks source link

"Step must be 1" in DPODataCollatorWithPadding #1304

Closed corbyrosset closed 8 months ago

corbyrosset commented 9 months ago

When I use the alignment handbook to run dpo (https://github.com/huggingface/alignment-handbook/blob/main/scripts/run_dpo.py) while loading a proprietary on-disk dataset stored in .jsonl format like so

raw_datasets = DatasetDict()
cache_dir = training_args.output_dir
if data_args.train_path is not None:
    raw_datasets["train"] = load_dataset("json", data_files=data_args.train_path, streaming=False, cache_dir=cache_dir)["train"].set_format(type="python")

instead of doing its the default "get_datasets()"

raw_datasets = get_datasets(data_args, splits=data_args.dataset_splits)

I encounter an error here in trl/trainer/utils/DPODataCollatorWithPadding where it expects a list for each feature rather than a tensor. The error is "step must be 1", which makes sense because PyTorch doesn't support negative step values in slicing, unlike Python lists or NumPy arrays.

if "prompt" in k:
    to_pad = [torch.LongTensor(ex[k][::-1]) for ex in features]

I fixed this by using torch.flip to reverse the elements if they are in a tensor rather than a list

to_pad = [torch.LongTensor(ex[k][::-1]) if isinstance(ex[k], list) else torch.LongTensor(torch.flip(ex[k], [0]))  for ex in features]

I guess my question is that somewhere along the way, things were getting converted to tensors and I don't know exactly where and whether this affects anything else

github-actions[bot] commented 8 months ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.