how local master launch trainer with distributed mode?

huawei-noah / vega

AutoML tools chain

http://www.noahlab.com.hk/opensource/vega/

Other

845 stars 176 forks source link

how local master launch trainer with distributed mode? #135

Closed jacob1017 closed 2 years ago

jacob1017 commented 3 years ago

Pipeline consist of multi-step. Search_pipe_step search multi-optimize configuration (hpo/network). Currently our search algorithm defined Generator to sample some configuration restricted by searchspace. Then trainer will evaluate those configuration and feedback to Search Algorithm. My question is how can we setting trainer to support multi-gpu?

jacob1017 commented 3 years ago

sometimes our experiments need multi-gpu with specified hyper-parameters.

zhangjiajin commented 3 years ago

@jacob1017

general:
    parallel_search: True
    parallel_fully_train: True
    devices_per_trainer: 2   # number of GPUs with specified hp