net = AlexNet(num_classes=10)
net = PipelineModule(layers=join_layers(net),
loss_fn=torch.nn.CrossEntropyLoss(),
num_stages=args.pipeline_parallel_size,
partition_method=part,
activation_checkpoint_interval=0)
This seems to run-over the forward module that you built in your AlexNet module, which makes me wonder about the possibility of having skip-connections in my module while using DeepSpeed's Pipeline-Parallelism optimizer.
In your example you convert the AlexNet into a list of layers:
which is later inserted to PipelineModule
This seems to run-over the forward module that you built in your AlexNet module, which makes me wonder about the possibility of having skip-connections in my module while using DeepSpeed's Pipeline-Parallelism optimizer.
Many thanks!