Tempo-Pytorch on GPU Clusters

Hello @liaoyuhua Thanks for making the tempo-pytorch implementation available But I am facing a few issues while running it on a GPU cluster (Out of Memory issue), which says CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 15.78 GiB total capacity; 13.53 GiB already allocated; 7.75 MiB free; 13.55 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Can you please provide some information on how were you able to run this (compute details if possible) and the time taken for you to carry out the training process?

Thanks in advance for your help

Hello @liaoyuhua Thanks for making the tempo-pytorch implementation available But I am facing a few issues while running it on a GPU cluster (Out of Memory issue), which says CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 15.78 GiB total capacity; 13.53 GiB already allocated; 7.75 MiB free; 13.55 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Can you please provide some information on how were you able to run this (compute details if possible) and the time taken for you to carry out the training process?

Thanks in advance for your help

config = TEMPOConfig( num_series=3, input_len=trainset.seq_len, pred_len=trainset.pred_len, n_layer=4, #-------------- It was 6. I cut it down by 2. model_type='gpt2', patch_size=16, patch_stride=8, lora=True, lora_config={ 'lora_r': 4, 'lora_alpha': 8, 'lora_dropout': 0.1, 'enable_lora': [True, True, False], 'fan_in_fan_out': False, 'merge_weights': False, }, prompt_config={ 'embed_dim': 768, 'top_k': 3, 'prompt_length': 3, 'pool_size': 30, }, interpret=False, )

liaoyuhua / tempo-pytorch

Tempo-Pytorch on GPU Clusters #6