rstudio / keras3

R Interface to Keras
https://keras3.posit.co/
Other
831 stars 283 forks source link

Multidimensional and Multistep (window scrolling) stock price prediction #536

Closed kejsiStruga closed 5 years ago

kejsiStruga commented 5 years ago

Hello everybody,

I have a set of data, lets say like below: screenshot from 2018-09-22 00-40-00

Now, to cut it short, the question is: how to split the dataset into training and test (turn it to supervised learning - ready)?

I should predict: median_house_value

Other information:

time_step: 20 //make prediction based on 20 previous values, aka window_size

prediction_range: 100 // should predict 100 data points ahead;

so if the last training example is on 2/2/2018 the predicted values should start from 3/2/2018 and end after 99 points.

I have run across examples, each in a different way. I definitely understand that it depends on the data, but currently I am looking for more general guidelines.

I have seen this, but on another post, an author had the following:

screenshot_3

So, he bases output values on 2 data points, in order to normalize. But I cannot see why, what's the point in doing so?

Please ask me if additional info is needed.

Thanks!!

skeydan commented 5 years ago

Hi,

the example dataset you display does not look time series-related, but from the other specifications, I gather that it's about time series prediction. For a systematic introduction to how to approach this, I'd recommend the Deep Learning for R (resp. Python) book, chapter 6. The R code is also available here: https://github.com/jjallaire/deep-learning-with-r-notebooks/blob/24a1e6a3178e853b3b412bbf8ee2088d625940de/notebooks/6.3-advanced-usage-of-recurrent-neural-networks.Rmd

As for that Python post, https://machinelearningmastery.com/multivariate-time-series-forecasting-lstms-keras/, I remember I found it an instructive example when I read it (a while ago).

kejsiStruga commented 5 years ago

Hey @skeydan thanks for your feedback! The dataset was not the real one, surely it should have had the time dimension. Nevertheless, I was able to solve the problem, the highlighted area on the screenshot was the normalization step which was not very clear to me.

I am closing this. Regards!