Will you open source the model weights which achieved the best performance in the paper?
I noticed the gpu you used is "NVIDIA A100 Tensor Core GPUs" in the paper. What are the minimum GPU requirements for pre-training this model? Is a single V100 sufficient?
Great works! I have two questions about Timer-XL.