Closed Cy-r0 closed 3 years ago
Hello Cy-r0,
Thank you for letting us about this! You are registered on Graphcore's Support platform. I have taken the liberty of turning your message into a Support ticket in your name, which will help us process it faster. You will have received an email on the address associated with your Support account.
For clarity, we will leave this GitHub issue open until it has been resolved to your satisfaction.
Resolved by updating to the latest firmware.
Hello, I'm training a model on a DELL DSS8840 in poptorch. I need to use pipelining because I'll have to train it on high-resolution images and that won't fit on a single IPU. However, when I split the model over more than 2 IPUs, I get the following error:
This error doesn't appear if I pipeline over 2 IPUs, it only does when I set my IPUs to be >= 4. It also doesn't disappear if I manually increase target.hostSyncTimeout to 1200 or above.
Below is a complete example to reproduce the error:
Let me know if there's something I'm missing here.