I'm trying to use PiPPy with a custom model that takes both 'input_ids' and 'labels' as inputs. To check for this functionality, I modified the basic pippy_gpt2.py example by first changing the model_class and model_name to GPT2LMHeadModel and then setting setting include_loss_args to True in the function call used to generate example_inputs:
example_inputs = generate_inputs_for_model(model_class, gpt2, model_name, args.batch_size, args.device, include_loss_args=True)
However, this fails with the following traceback:
[rank0]: TypeError: forward() got an unexpected keyword argument 'labels'
RuntimeError:
[rank0]: [Stage 0] failed to run forward:
[rank0]: args: ()
[rank0]: kwargs: {'input_ids': 'Tensor(torch.Size([1, 1024]), grad=False)', 'labels': 'Tensor(torch.Size([1, 1024]), grad=False)'}
This occurs because PiPPy splits the graph module (split_gm) such that the labels input is sent to the last (4th) submodule, so the first submodule is not expecting an input 'labels'.
I also tried to modify pippy_gpt2.py to insert the labels at the last submodule in schedule.step as follows (although this is not optimal as a long-term solution):
This throws the following error, probably because internal submodules expect RecvInfo and tensors from previous layers rather than new values from input placeholders?
[rank3]: AssertionError: Expected RecvInfo but got <class 'torch.distributed.pipelining._PipelineStage.RootArgPlaceholder'>
I could try to debug further, but is there a better solution or does anyone have any ideas for how to implement this? Thanks.
Hi,
I'm trying to use PiPPy with a custom model that takes both 'input_ids' and 'labels' as inputs. To check for this functionality, I modified the basic pippy_gpt2.py example by first changing the model_class and model_name to GPT2LMHeadModel and then setting setting
include_loss_args
to True in the function call used to generate example_inputs:example_inputs = generate_inputs_for_model(model_class, gpt2, model_name, args.batch_size, args.device, include_loss_args=True)
However, this fails with the following traceback:
This occurs because PiPPy splits the graph module (split_gm) such that the labels input is sent to the last (4th) submodule, so the first submodule is not expecting an input 'labels'.
I also tried to modify pippy_gpt2.py to insert the labels at the last submodule in schedule.step as follows (although this is not optimal as a long-term solution):
This throws the following error, probably because internal submodules expect RecvInfo and tensors from previous layers rather than new values from input placeholders?
[rank3]: AssertionError: Expected RecvInfo but got <class 'torch.distributed.pipelining._PipelineStage.RootArgPlaceholder'>
I could try to debug further, but is there a better solution or does anyone have any ideas for how to implement this? Thanks.