An offical implementation of PatchTST: "A Time Series is Worth 64 Words: Long-term Forecasting with Transformers." (ICLR 2023) https://arxiv.org/abs/2211.14730
Apache License 2.0
1.51k
stars
262
forks
source link
how to use learner.distributed(), in self supervised pretrain code ? #96
how to use self supervised pretrain code to train on multiple GPUS or multi-node ?
I want to user revised the code for multinode or multiple GPUS for large dataset and large parameters.
But not successed