Closed Eliav2479 closed 1 year ago
For a single step into the future, would that help
parser.add_argument('--target_points', type=int, default=1, help='forecast horizon')
The current masking in the code is random https://github.com/yuqinie98/PatchTST/blob/e66adfdd4cc5ed9760bbfbfc6bf68d5afc82cbc6/PatchTST_self_supervised/src/callback/patch_mask.py#L113 Do you have a casual mask pytorch implementation you are considering?
This does not address my question. I was talking about run time issues
Training time can be solved in many different ways - multi-gpu, larger batch size, faster data-loader... Why do you think that causal mask is your main bottle neck?
Please read the question
To make it clear, I didn't write this code/paper, I am like you - using it. In the open source community it is not always easy to understand each other. I would suggest to be kinder in order to get assistance
When you have a window size of H and a causal mask you can predict H tokens in a single pass.
Indeed the methods is patch based, it might to be the best fit for predicting a single data point You might want to to use only the pre-training with patch to create embedding. For the second stage (fine-tunning) you can have a very simple regression from embedding predicting a single time step (1 layer NN without patches)
I would suggest to wait for the authors for a response. Thank you for replying.
Thanks for asking @Eliav2479 and sorry for the late reply. Unfortunately we do not understand your question very well so we would appreciate if you could explain more of your concern. We basically agree with the solution that @ikvision proposed if you want to apply it to multiple-step prediction. Or you would just directly do multiple-step forecasting (DMS rather than IMS in this paper https://arxiv.org/pdf/2205.13504.pdf). The input is X1,...,Xt and output is Xt+1,..,Xt+T, which is done in one pass.
any estimate on how long it will take to run supervised and self-supervised learning based on default model and params.
It varies on different datasets, epochs, GPU... thus it would be hard to answer. The fastest one may take half an hour while the largest model takes a day. @DIKSHAAGARWAL2015
I want to congratulate you for the great patch transformer paper.
I want to ask a question: I have a dataset which i hold as a pandas dataframe.
Given some window size I want to predict the next time step. Be This means I want to predict only a single step into the future:
Given X1,...,Xt Predict Xt+1
This means I want to predict only a single step into the future.
As I understand if I want to use your model for the task I will need to have as many forward iterations as the dataset size since you are not using a casual mask in the transformer.
How can this be resolved?
Thanks