pytorch / examples

A set of examples around pytorch in Vision, Text, Reinforcement Learning, etc.
https://pytorch.org/examples
BSD 3-Clause "New" or "Revised" License
22.42k stars 9.53k forks source link

Error while running distributed/rpc/parameter_server #856

Open Weigaa opened 3 years ago

Weigaa commented 3 years ago

When I used more than 3 nodes(1PS and 2workers) to accomplish this demo, an error occurred: RuntimeError: Error on Node 0: one of the variables needed for gradient computation has been modified by an inplace operation: [CPUFloatType [64, 32, 3, 3]] is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

image

rootlu commented 3 years ago

Hi, you can refer to here.