harvard-cns / teal

Codebase for Teal (SIGCOMM 2023)
MIT License
42 stars 10 forks source link

Teal: Traffic Engineering Accelerated by Learning

Teal is a learning-accelerated traffic engineering (TE) algorithm for cloud wide-area networks (WANs), published at ACM SIGCOMM '23. By harnessing the parallel processing power of GPUs, Teal achieves unprecedented acceleration of TE control, surpassing production TE solvers by several orders of magnitude while retaining near-optimal flow allocations.

Getting started

Hardware requirements

*The baseline TE schemes only require a CPU to run. Teal runs on CPU as well, but its runtime will be significantly longer than on GPU.

Cloning Teal with submodules

Dependencies

Dependencies only required for baselines

Dependencies only required for Teal

Code structure

.
├── lib                     # source code for Teal (details in lib/README.md)
├── pop-ncflow-lptop        # submodule for baselines
│   ├── benchmarks          # test code for baselines
│   ├── ext                 # external code for baselines
│   └── lib                 # source code for baselines
├── run                     # test code for Teal
├── topologies              # network topologies with link capacity (e.g. `B4.json`)
│   └── paths               # paths in topologies (auto-generated if not existent)
└── traffic-matrices        # TE traffic matrices
    ├── real                # real traffic matrices from abilene.txt in Yates (https://github.com/cornell-netlab/yates)
    │                       # (e.g. `B4.json_real_0_1.0_traffic-matrix.pkl`)
    └── toy                 # toy traffic matrices (e.g. `ASN2k.json_toy_0_1.0_traffic-matrix.pkl`)

Note: As we are not allowed to share the proprietary traffic data from Microsoft WAN (or the Teal model trained on that data), we mapped the publicly accessible Yates traffic data to the B4 topology to facilitate code testing. For the other topologies (UsCarrier, Kdl, and ASN), we synthetically generated "toy" traffic matrices due to their larger sizes.

Evaluating Teal

To evaluate Teal on the B4 topology:

$ cd ./run
$ python teal.py --obj total_flow --topo B4.json --epochs 3 --admm-steps 2
Loading paths from pickle file ~/teal/topologies/paths/path-form/B4.json-4-paths_edge-disjoint-True_dist-metric-min-hop-dict.pkl
path_dict size: 132
Creating model teal-models/B4.json_flowGNN-6_std-False.pt
Training epoch 0/3: 100%|█████████████████████████████████| 1/1 [00:01<00:00,  1.63s/it]
Training epoch 1/3: 100%|█████████████████████████████████| 1/1 [00:00<00:00,  2.45it/s]
Training epoch 2/3: 100%|█████████████████████████████████| 1/1 [00:00<00:00,  2.61it/s]
Testing: 100%|████████████████| 8/8 [00:00<00:00, 38.06it/s, runtime=0.0133, obj=0.9537]

To show explanations on the input parameters:

$ python teal.py --help

Results will be saved in

Realistic traffic matrices are only available for B4 (please refer to the note above). For the other topologies — UsCarrier (UsCarrier.json), Kdl (Kdl.json), or ASN (ASN2k.json), use the "toy" traffic matrices we generated (taking UsCarrier as an example):

$ python teal.py --obj total_flow --topo UsCarrier.json --tm-model toy --epochs 3 --admm-steps 2

Evaluating baselines

Teal is compared with the following baselines:

To evaluate the baselines on B4, run the following commands from the project root:

$ cd ./pop-ncflow-lptop/benchmarks
$ python path_form.py --obj total_flow --topos B4.json
$ python top_form.py --obj total_flow --topos B4.json
$ python ncflow.py --obj total_flow --topos B4.json
$ python pop.py --obj total_flow --topos B4.json --algo-cls PathFormulation --split-fractions 0.25 --num-subproblems 4

Results will be saved in

To test on UsCarrier (UsCarrier.json), Kdl (Kdl.json), or ASN (ASN2k.json), specify the "toy" traffic matrices we generated (taking UsCarrier as an example):

$ python path_form.py --obj total_flow --tm-models toy --topos UsCarrier.json
$ python top_form.py --obj total_flow --tm-models toy --topos UsCarrier.json
$ python ncflow.py --obj total_flow --tm-models toy --topos UsCarrier.json
$ python pop.py --obj total_flow --tm-models toy --topos UsCarrier.json --algo-cls PathFormulation --split-fractions 0.25 --num-subproblems 4

Extending Teal

To add another TE implementation to this repo,

Citation

If you use our code in your research, please cite our paper:

@inproceedings{teal,
    title={Teal: Learning-Accelerated Optimization of WAN Traffic Engineering},
    author={Xu, Zhiying and Yan, Francis Y. and Singh, Rachee and Chiu, Justin T. and Rush, Alexander M. and Yu, Minlan},
    booktitle={Proceedings of the ACM SIGCOMM 2023 Conference},
    pages={378--393},
    month=sep,
    year={2023}
}