nok-halfspace / Transformer-Time-Series-Forecasting

376 stars 102 forks source link

Example with data? #1

Closed halvgaard closed 3 years ago

halvgaard commented 3 years ago

Thanks for a nice paper on medium :-)

Will you add a runnable example with data at some point? Would be really nice to test this method but it's difficult to reverse engineer what the data input and expected columns looks like. Maybe it's just a single time series with timestamps and values? But seems like you have a lot of preprocessing with many sensors.

nok-halfspace commented 3 years ago

Hi Rasmus. Thanks for the feedback ! :)

I can unfortunately not added a runnable example at the moment, as the dataset is covered by an NDA. I have reached out to the company who provided the data to ask if they can provide a cleaned dataset that I could post for you. But to answer your question, I am using uni-variate time series. The preprocessing strips all other variables so that the model only takes humidity. In the transformer, this gives a dimension of 7 (humidity + 6 encodings for the timestamp).

In the full research project, I also tried multi-variate inputs but this didn’t improve the model, and I omitted it in the medium article.

diegoquintanav commented 3 years ago

From what I understood from the code I did some changes to port my data

start = np.random.randint(0, len(self.df[self.df["filtered_id"]==idx]) - self.T - self.S) sensor_number = None sensor_number = str(self.df[self.df["filtered_id"]==idx][["sensor_id"]][start:start+1].values.item()) index_in = torch.tensor([i for i in range(start, start+self.T)]) index_tar = torch.tensor([i for i in range(start + self.T, start + self.T + self.S)]) _input = torch.tensor(self.df[self.df["filtered_id"]==idx][attrs][start : start + self.T].values) target = torch.tensor(self.df[self.df["filtered_id"]==idx][attrs][start + self.T : start + self.T + self.S].values)


- My data is sampled daily, so I changed args in main.py to reflect this

training_length=48, # use 2 days forecast_window=24, # to predict one


- the dataloader is loading data from different sensors. For a univariate time series, I'm assuming this is a single sensor so I added a column `df_test["sensor_id"] = 1` to the train and test dataset
- using the same logic, I added `df_test["filtered_id"] = 1`

After this, I got the model training with my own dataset, and producing _some_ results. They look odd, the model seems to be learning the window average, so I need to check that. But it serves as a starting point, I think. 
ymlea commented 3 years ago

Hello, interesting application. Could you please help with the following: 1) How should year be used in the model ? Raw year values? 2) For DataLoader.py, a. could you explain explicitly about how the data were made? b. I don't get what the use of index_in and index_tar, also it seems you only used one sensor's data? c. the final dataset(_input) seems only contains 1 sample (48 time points)?

Thanks for answering.

nok-halfspace commented 3 years ago

Hello,

Thank you for your interest in my work.

Without trying it myself, I assume scaled year values could be added (no cyclical encoding, as it does not have a cycle nature). However, unless there are correlations between the data and the year, it may not be necessary.

To answer your questions, the pre-processing was specific to my dataset, which I could not publish for NDA reasons. However the preprocessing script is on my GitHub, so it is possible to follow the steps I made.

Index_in is the index of the input data for the model, and index_tar is the index of the target values I predict.

I use multiple sensors to train and test, please see lines 29 and 30 in my Preprocessing.py script.

The output is 25 samples, please see Inference.py script.

Best, Natasha

Den 27. maj 2021 kl. 00.19 skrev ymlea @.***>:

Hello, interesting application. Could you please help with the following:

How should year be used in the model ? Raw year values? For DataLoader.py, a. could you explain explicitly about how the data were made? b. I don't get what the use of index_in and index_tar, also it seems you only used one sensor's data? c. the final dataset(_input) seems only contains 1 sample (48 time points)? Thanks for answering.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/nklingen/Transformer-Time-Series-Forecasting/issues/1#issuecomment-849158943, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHDMGWAKDAD7XCFA3NNP77TTPVXXVANCNFSM4YV5KF4A.

ymlea commented 3 years ago

Thanks for your reply. Could you also explain how the time stamps were used as the positional encodings? It seems you enter all of them as data, only disregarded them when calculated the prediction and error. So transformer will automatically treat them(all the rest columns other than the ones used in calculating the prediction errors ) as position encoding?

Best, Sean

nok-halfspace commented 3 years ago

I have now uploaded a modified version of the dataset used :)