Add model based on the paper "Modeling Extreme Events in Time Series Prediction"

tymefighter commented 3 years ago

We can have two variants of this model - one for single-sequence and another for multi-sequence

yvonni360 commented 3 years ago

In Train.py in the function TrainOneSeq what is the: self.memOut.trainable_variables?

tymefighter commented 3 years ago

@yvonni360

memOut is the Dense layer which computes an embedding using the state of the GRU network.
self.memOut.trainable_variables is the list of variables which are trainable in the Dense layer (for example the weights, bias values of the layer)

tymefighter commented 3 years ago

@yvonni360 We were not sure about our implementation of the paper. Hence, we placed it in the other section of the package. It would be great if you could help us confirm our implementation of the paper.

yvonni360 commented 3 years ago

Thank you for your answer.

I think what is different compared to their paper is the fact that you use the parameter seqLength in the train function at init. So if I understand correctly, you do the whole procedure for every sequence instead they do that only once for the entire sequence. Correct me if I am wrong.

Il giorno ven 16 apr 2021 alle ore 21:05 Ahmed Zaheer Dadarkar < @.***> ha scritto:

@yvonni360 https://github.com/yvonni360 We were not sure about our implementation of the paper. Hence, we placed it in the other section of the package. It would be great if you could help us confirm our implementation of the paper.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tymefighter/Forecast/issues/55#issuecomment-821495995, or unsubscribe https://github.com/notifications/unsubscribe-auth/ATWUPNKI56FVHZ3WXN22IPDTJCC6VANCNFSM4TXSXV6A .

tymefighter commented 3 years ago

That is correct, the reason for doing this is that their paper trains on multiple time series, and we wanted to build a model which trains on a single time series. Performing just one step of gradient descent to update the model parameters after seeing the entire time series does not lead to the parameters being trained on that single time series. Hence, we thought to break up the time series into disjoint contiguous seqLength sized sequences and perform gradient descent to update the parameters after each such sequence.

yvonni360 commented 3 years ago

Thank you for the clarification.

Il giorno mer 28 apr 2021 alle ore 15:09 Ahmed Zaheer Dadarkar < @.***> ha scritto:

That is correct, the reason for doing this is that their paper trains on multiple time series, and we wanted to build a model which trains on a single time series. Performing just one step of gradient descent to update the model parameters after seeing the entire time series does not lead to the parameters being trained on that single time series. Hence, we thought to break up the time series into disjoint contiguous seqLength sized sequences and perform gradient descent to update the parameters after each such sequence.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tymefighter/Forecast/issues/55#issuecomment-828442267, or unsubscribe https://github.com/notifications/unsubscribe-auth/ATWUPNLLUYY27F3MRHUY2JDTLACKNANCNFSM4TXSXV6A .

wentixiaogege commented 3 years ago

which model file are we talking about ? https://github.com/tymefighter/Forecast/blob/master/ts/model/extreme_time.py https://github.com/tymefighter/Forecast/blob/master/ts/model/extreme_time2.py

tymefighter commented 3 years ago

Hi @wentixiaogege, @yvonni360 was talking about a deprecated implementation, which is present in this directory: Forecast/other/model1

wentixiaogege commented 3 years ago

Ohh, so which one is the implement of this paper "Modeling Extreme Events in Time Series Prediction" ? I think is Forecast/other/model1?

which paper are those two implement based on, please share:

https://github.com/tymefighter/Forecast/blob/master/ts/model/extreme_time.py https://github.com/tymefighter/Forecast/blob/master/ts/model/extreme_time2.py

wentixiaogege commented 3 years ago

which paper are those twp implement based on, please share:

https://github.com/tymefighter/Forecast/blob/master/ts/model/extreme_time.py https://github.com/tymefighter/Forecast/blob/master/ts/model/extreme_time2.py

tymefighter commented 3 years ago

Hi @wentixiaogege, "Modeling Extreme Events in Time Series Prediction" was implemented earlier, but it's performance on a few test datasets were not satisfactory, hence we introduced minor modifications to the model in extreme_time and extreme_time2.

Currently, we require a well-performing implementation implementation of "Modeling Extreme Events in Time Series Prediction" , hence this issue was created

wentixiaogege commented 3 years ago

Thanks，so do you prepare to implement the Uber version of this paper "Time-series Extreme Event Forecasting with Neural Networks at Uber" ?

tymefighter commented 3 years ago

Yeah...we plan to implement that one as well, but since we are interested only in point forecasts, and not prediction uncertainty, we exclude the MC Dropout step during the prediction stage of the algorithm.

falcoxman commented 3 years ago

Hi,

I have been dealing with the implementation of this paper "Modeling Extreme Events in Time Series Prediction" recently. I have implemented it in pytorch and made some changes even on top of the proposed methologies as well. But, I could not get good results training on a single univariate time series, predicting on test part of the same data. I will check your implementation to compare with mine. If I detect an upgrade or change would inform you. Can you clarify the files what you have implemented and what are their differences, please? I could not fully understand from the previous comments. Thanks in advance.

tymefighter commented 3 years ago

@falcoxman Hi !

I don't think the code in https://github.com/tymefighter/Forecast/tree/master/other/model1 would be very useful, since the implementation was not able to acquire satisfactory results, and took too much time for training.

The algorithms in https://github.com/tymefighter/Forecast/blob/master/ts/model/extreme_time.py and https://github.com/tymefighter/Forecast/blob/master/ts/model/extreme_time2.py are modeled after that paper, but have been modified for reducing the training times and improving prediction results.

However, I was not very satisfied by their results as well, hence, I would be implementing the original algorithm again.

And, we have worked on developing a better algorithm which focuses on forecasting extreme values, with improvised prediction results and much lower training times. But, I could not release it to this package since we have not published it yet.

falcoxman commented 3 years ago

@tymefighter Thank you for the reply.

I had a chance to check your implementation. I have couple of opinions/questions and would like to share what I have made different.

I approached all the problem like we have right extreme and left extreme problems and therefore 2 thresholds. Thus, in my proposed approach I have created "b1" and "b2" two trainable variables that are learned.
As I see you are subsetting windows to build historical windows from the data has been past. But, I think we can subset any window from any part of the train data. Also, in the figure 3 they have showed in the paper they are doing like that.
In my opinion, building a memory module for each time "t" and epoch quite slowing the training and is not leading good performance. I think we can build one memory module for each epoch one as an alternative, to have similar result.
I have not seen the "Extreme Value Loss" (EVL) in your objective function, it seemed to me only MSE. I may be wrong. Can you mention in which file or function did you take into account EVL for memory and for final loss function?
Which threshold values are you using? Do you have a single value or two different values for left and right extreme events seperately?
In my experiments I have preferred validation on different validation set instead of cross-validation. As I see, time series mostly does not have that commonalities in extreme part over the course of time. Thats why, my model is starting to overfit train data after extracting smoothed expected fit for train and validation set. Therefore, the performance of the model has been quite similar with lstm and gru. What do you think about my opinion? Do you have similar or dissenting opinions?
When do you plan to finish the original algorithm?

These are the first questions that come to my mind. If I have more would share with you. Thank you very much in advance.

tymefighter commented 3 years ago

Hey @falcoxman,

My current implementation targets only "right extremes". It's great if your approach targets both of them.
I think this part is not quite clear from the paper, I thought that using the past data for constructing the memory was more intuitive. In figure 3, they mention the use of [x1, ..., xt] for constructing the memory when they are at timestep t, I guess this statement is in favor of using only the past data.
That is exactly what I have done in the modification, instead of building it at every timestep, I have built it at every epoch.
Your observation is correct, the EVL loss is not present in those implementations. The reason for that is that, experimentally I found that dropping the EVL component from the total loss did not lead to a decrease in the performance.
The current implementation targets only "right extremes", hence the algorithm takes as input a single threshold.
My opinion in this context seems to conflict yours, the paper mentions that extreme values tend to have "limited" degrees of freedom, hence they are memorized using a memory network. But, you say that future extreme values aren't much related to previous extreme values. I feel that if this was true, then there wouldn't have been any point in storing the "history" using a memory network architecture. What I normally do for time series data is - break a single long time series (say with T timesteps) into 3 parts sequentially - train (first 0.8T timesteps), validation (next 0.1T timesteps) and test (last 0.1T timesteps), then train the model with different hyperparameters on the training set, and selecting the best one using the performance on the validation set. Finally, compute the chosen model's results on the test set.
Well, to be honest, I already have implemented this algorithm, but in a "rough" and "unclean" manner, for benchmarking our model, and thought that cleaning it up and adding it this python package would be a lot of work unless it be of use to anyone else. But now that I find that I could really be of help to you, I'll try my best to put up the algorithm within a few days. Similarly, I have implemented the Uber paper as well (again in a "rough" manner), and would add it to this package within a few days time after cleaning it up. I will notify you as soon as they are up !

I would be really happy to be of any help to you ! Please share with me any more questions you may be having. Thank You as well !

falcoxman commented 3 years ago

Hi @tymefighter,

Reply to some numbers:

I totally agree it is not clear. What I have understand is to feed subsets of sliding windows to Gru to optimize EVL. In my opinion, it could be M number of sliding window from any random part of the train data. They have also mentioned that they are building s_j with the last hidden state. So, if we take last hidden state of different sliding window when EVL is optimized, as I see they are being quite similar values. When I was applying this, I got very similar u_t values for each time for example [0.324, ...., 0.315, ..., 0.334] and [-0.176, ..., -0.173, ... -0.181] for time t. Thats why memory module could not provide proper differentiation for me. What I have proposed is, to remove memory module, then I created two binary classification model to learn extreme value for sliding windows by optimizing EVLs. If we have two output, softmax of right element is being extreme value as example for right extreme events. Moreover, what I have seen in your solution is you are not utilizing hidden values and using output from gru to build memory module. I have also tried it, but did not perform well.
EVL seemed to me very smart approach. But, does not helping a lot performance. In the paper, there are coeffient of EVL in the main objective function. To make that lambda value like 0.1 instead of 0 has providing very little improvement than lstm/gru. But, it could be random affect I am not sure.
In my opininon, what you have been doing is correct approach. I think there is effect of extremes, but proposed memory module may not performing well. Therefore, I have proposed learning them. And in the regression problem we can use that 2 binary classifier to change prediction value by the help of extreme values.
Thank you very much for your effort and contribution. It is very helpful to discuss the paper with you.

lhb1992 commented 2 years ago

@tymefighter have you research and develop "Extremely Event time series Prediction"? Do you have any update based on "Modeling Extreme Events in Time Series Prediction"?

YingxiaoKong commented 1 year ago

@tymefighter Have you ever thought why the equation 14 will work in the paper? 'When there is a similarity between the current time step and certain extreme events in history, then ut will help detect such a pumping point by setting ut non-vanishing', is this saying that the there is actually some sign to indicate the occurrence of upcoming extreme event? A lot of times extreme events just happen for no reason, like a sudden AWS outage, I don't think the previous time series data can actually predict that.

Gczmy commented 4 months ago

Hi @falcoxman do you have any plan to public your Pytorch implemented version of EVL? I also want to reproduce the paper's results.

tymefighter / Forecast

Add model based on the paper "Modeling Extreme Events in Time Series Prediction" #55