yuqinie98 / PatchTST

An offical implementation of PatchTST: "A Time Series is Worth 64 Words: Long-term Forecasting with Transformers." (ICLR 2023) https://arxiv.org/abs/2211.14730
Apache License 2.0
1.37k stars 248 forks source link

How to use multi-GPU in patchtst_pretrain.py and patchtst_finetune.py? #52

Closed dawn0713 closed 1 year ago

dawn0713 commented 1 year ago

How to use multi-GPU in patchtst_pretrain.py and patchtst_finetune.py? Thank you.

linfeng-du commented 1 year ago

You can just use --use_multi_gpu to enable data parallel training and specify which devices would you like to use in the code. E.g. model = nn.DataParallel(model, device_ids=[0, 1, 2, 3]). Since it's using DataParallel, it's the only code that's different from single-GPU training. (There seems to be some DDP functionalities but they're not used in the current codebase)

linfeng-du commented 1 year ago

One thing worth noticing is that it filters out the devices that are occupied > 10%. So the specified device might get skipped.

dawn0713 commented 1 year ago

You can just use --use_multi_gpu to enable data parallel training and specify which devices would you like to use in the code. E.g. model = nn.DataParallel(model, device_ids=[0, 1, 2, 3]). Since it's using DataParallel, it's the only code that's different from single-GPU training. (There seems to be some DDP functionalities but they're not used in the current codebase)

In the supervised version of PatchTST, I can use '--use_multi_gpu'. But in the self-supervised version, there is no argument for multi-gpu.

linfeng-du commented 1 year ago

You can just use --use_multi_gpu to enable data parallel training and specify which devices would you like to use in the code. E.g. model = nn.DataParallel(model, device_ids=[0, 1, 2, 3]). Since it's using DataParallel, it's the only code that's different from single-GPU training. (There seems to be some DDP functionalities but they're not used in the current codebase)

In the supervised version of PatchTST, I can use '--use_multi_gpu'. But in the self-supervised version, there is no argument for multi-gpu.

Yes the authors didn't include it there you can just add them yourself. While it's worth noticing that it should be added directly after the model is instantiated (inside get_model). For patchtst_pretrain.py it first run a epoch to find the learning rate so you have to make sure the model is wrapped by nn.DataParallel before that.

yuqinie98 commented 1 year ago

Thanks for the great comments @linfeng-du ! Please let us know if you have more problems. Usually we will reply fast if you send via email.