use multi-IPUs on multi-host（model parallelism）

@zli2014 @lin72h as far as I can share with you, GC uses internally Phabricator for a very long time before recent plan of migrating to Github to meet compliance requirements.

A best way to discuss the issue is to join in the slack channel here graphcorecommunity.slack.com to have direct talk with Graphcore experts or write an email (Graphcore it help) to seek help, a ticket will be created for you internally.

Mark and many other people are the great experts you can seek help from.

As for your question :

Use PopART pipeline features with PopART framework directly

PopART multi-IPUs usage

The code you shared is a xla device designed exclusively for Tensorflow 1.x/2.0 in this generation of solutions.

PopART, on the other hand, serves ONNX frontend as a standalone ML framework, where Poptorch(PyTorch fork), PopXL (explicit transform and control), PopRT (look free task queue backend for inference in data center) built on top of it.

I can ensure you that PopART is a wonderfully collaborative, interdisciplinary engineering effort, and fully functional ML framework for ONNX frontend as you can see in https://docs.graphcore.ai/projects/popart-user-guide/en/latest/.

Hence, you can directly use PopART multi-device solutions such as Pipeline and data parallel for training/inference scenarios directly. They both use poplar GCL to compile poplar programs to different devices.

To use multi-IPUs is as simple as creating multi-IPUs device and setting some session attributes such as session.virtualGraphMode, session.enablePipeline, session.OverlapIO: https://docs.graphcore.ai/projects/popart-user-guide/en/latest/api-python.html#session-options

Here are examples: c++ : simple :

https://github.com/graphcore/popart/blob/sdk-release-3.3/tests/integration/auto_virtual_graph_tests/auto_virtual_graph_relu_on_weight_test_0.cpp

python complex (mlperf 2.0 popart) :

https://github.com/mlcommons/training_results_v2.0/tree/main/Graphcore/benchmarks/bert/implementations/popart

Multi-IPUs algorithm supported

Mainly supportted with pipeline (auto, training/inference), parallel tensor replicated parallel (only training), and tensor parallel (experimental , application driven).

Since back propagation ops cost much more than fwd ops, interleaved pipeline mode is also supported in tensorflow. This is a slightly different from Tensorflow poplar XLA in PopART.

Poptorch / next generation of torch (pending)

Instead of using PopART directly, you can use poptorch to write torch codes to run models . There are already plenty of examples in graphcore/exmaples repo.

You can get this from SDK release (the framework is also open source now).

PopRT (only to customers)

You can use PopRT to do very high efficient inference in data center env.

PopRT will choose the best popart/model runtime configurations in data center env (NUMA Node, (R)DMA supported) and only you need to do is to supply an onnx model : simple, easy and powerful.

You can get this from SDK release too.

graphcore / popart