An offical implementation of PatchTST: "A Time Series is Worth 64 Words: Long-term Forecasting with Transformers." (ICLR 2023) https://arxiv.org/abs/2211.14730
It varies on different datasets, epochs, GPU... thus it would be hard to answer. The fastest one may take half an hour while the largest model takes a day on multiple GPUs.
any estimate on how long it will take to run supervised and self-supervised learning based on default model and params.