optimize pytorch computation graph for training

jiazhihao / TASO

The Tensor Algebra SuperOptimizer for Deep Learning

Apache License 2.0

687 stars 90 forks source link

optimize pytorch computation graph for training #32

Open knsong opened 4 years ago

knsong commented 4 years ago

hi, thanks for the great work. But does it support optimize computation graph of pytorch for faster training? If supports, is there any benchmark?

jiazhihao commented 4 years ago

@knsong Thanks for your interest in TASO. We support optimizing PyTorch graphs by transforming the graph to ONNX format using torch.onnx. For training, please set the batch_size, which is typically the first dimension of the input tensors in ONNX graphs, to the desired number.

Note that TASO currently only considers forward operators, so you will get a graph optimized for forward processing.

knsong commented 4 years ago

@knsong Thanks for your interest in TASO. We support optimizing PyTorch graphs by transforming the graph to ONNX format using torch.onnx. For training, please set the batch_size, which is typically the first dimension of the input tensors in ONNX graphs, to the desired number.

Note that TASO currently only considers forward operators, so you will get a graph optimized for forward processing.

Did you ever train the optimized graph using pytorch or tensorflow?

jiazhihao commented 4 years ago

TASO optimizes for inference performance (i.e., minimizing forward processing time). The optimized graph is mathematically equivalent to the original graph, and therefore can be used for training as well, though the graphs are optimized for inference.

We are currently working on adding training cost into the cost model, and will update this thread only the training support is ready.

knsong commented 4 years ago

TASO optimizes for inference performance (i.e., minimizing forward processing time). The optimized graph is mathematically equivalent to the original graph, and therefore can be used for training as well, though the graphs are optimized for inference.

I notice that TASO can merge a conv and a following BN into a single conv for best inference efficiency, but that's not suitable for training.

We are currently working on adding training cost into the cost model, and will update this thread only the training support is ready.

Thanks. Looking forward to it.