Closed Coobiw closed 7 months ago
Solved, the source code of deepspeed:
batch[0]
for inputs, batch[1]
for labels.
So the collate_fn
should output:
return ((new_batch['image'], data_dict['input_ids'],data_dict['labels'],data_dict['attention_mask']),
data_dict['labels']
)
Sorry for interrupt!
When I use the Pipeline Parallel feature, I find a error about the data_iter. Following is my code:
The
collate_fn
is customed, which returns:Tuple[Tensor, Tensor, Tensor, Tensor]
, like:I print size of each item, like:
However, I find that the input of first pipe layer only collect the
next(iter(training_batch))[0]
i.e.new_batch['image']
, caused:This makes me confused. How can I solve it? Thanks for help!