I'd love to share a tutorial, from which we can learn how to train our AI model in a distributed manner step by step.
Motivation and Context
As the deep learning model are getting large, we might feel it hard to train it in a non-distributed manner. This tutorial clearly illustrates to me that
What is distributed training
How can I use the efficient parallelization techniques to perform and speed up the AI model training
It also provides some advanced tutorials to help me define my own parallel model
It is quite interesting and helpful for a deep learning researcher.
How Has This Been Tested?
There exists many popular parallelization techniques when it comes to distributed training. But I fail to have a general idea of them. This tutorial gives me a clear view of these advanced parallelization techniques and I can apply all of them on my code after I read this tutorial. I feel I can write the distributed deep learning models just like how I write the model on my laptop. Most importantly, it can greatly save my training time. Hence, I recommend to all of you who need to train deep learning models.
Types of changes
[ ] Content Update (change which fixes an issue or updates an already existing submission)
[x] New Article (change which adds functionality)
[ ] Documentation change
Checklist:
[x] My code follows the code style of this project.
[x] I have updated the documentation accordingly.
[x] I have read the CONTRIBUTING document.
[x] I have made checks to ensure URLs and other resources are valid
Description
I'd love to share a tutorial, from which we can learn how to train our AI model in a distributed manner step by step.
Motivation and Context
As the deep learning model are getting large, we might feel it hard to train it in a non-distributed manner. This tutorial clearly illustrates to me that
It is quite interesting and helpful for a deep learning researcher.
How Has This Been Tested?
There exists many popular parallelization techniques when it comes to distributed training. But I fail to have a general idea of them. This tutorial gives me a clear view of these advanced parallelization techniques and I can apply all of them on my code after I read this tutorial. I feel I can write the distributed deep learning models just like how I write the model on my laptop. Most importantly, it can greatly save my training time. Hence, I recommend to all of you who need to train deep learning models.
Types of changes
Checklist: