sktime / pytorch-forecasting

Time series forecasting with PyTorch
https://pytorch-forecasting.readthedocs.io/
MIT License
3.9k stars 620 forks source link

Ability to do multi-target forecasting? #84

Closed emigre459 closed 3 years ago

emigre459 commented 3 years ago

I noticed that the TimeSeriesDataset class is designed to only look at one column for the target variable. In my use case, I'm trying to forecast traffic patterns and, as such, will need to forecast X and Y simultaneously. Given that these are expected to be somewhat correlated (as the presence of a road, for example, isn't equally probable at all latitudes and longitudes), I don't think it would work to build two separate models for forecasting each in isolation. Is there a theoretical limitation for the models currently included in the package that makes it impossible to have more than one target variable?

AlexMRuch commented 3 years ago

Ah, I was also interested in knowing if this library could do multi-target forecasting as well!

jdb78 commented 3 years ago

This is indeed a limitation right now. It could be a great contribution! If you want to work on this, I believe what is needed:

BTW @emigre459: What are you exactly forecasting? Is X and Y not a static input variable and the target a single variable such as vehicles/minute? Not sure I understand the problem sufficiently.

AlexMRuch commented 3 years ago

The case I was thinking, for what it's worth, was based off the notebook I shared with you and the other issue. It would be interesting to simultaneously model the number of positive cases across states as well as the number of deaths across states simultaneously — so have both of these as target variables that are predicted based off other model covariates. I still don't know if that's the best idea, though, for my case, as on the one hand you would probably want the number of new positive cases to be predictor of the mortality target, and on the other hand mortality could be a predictor of new cases in the sense that it indicates the severity of the disease. Not sure if having both of these be targets would help the model predict better giving that it can learn the intricacies and interactions of both possibly.

emigre459 commented 3 years ago

@jdb78 I've got to do some testing to make sure there aren't currently any "gotchas" when using my dataset with a single target, but I expect it to be fine. If that turns out to be the case, I can try taking on the multi-target issue. To clarify, sorry for mixing domain-specific syntax for my variable names with machine learning syntax: the X and Y I was referring to was longitude and latitude, not features and target(s). Given a specific route for a vehicle (e.g. a trucking fleet that runs similar routes multiple times over a sufficiently long time period), predicting the future path of any given vehicle. So I'm trying to input lat and long values for early timesteps and get it to forecast the lat and long values for the future. So this would be time-varying, with the early timesteps known and the future ones unknown (which I believe is the semantics used by the TimeSeriesDataset class).

@AlexMRuch I see what you're saying about the difficulty in your case separating targets and features. This may be something best solved by experimentation. The good news is that you can test the single-target case right now!

jdb78 commented 3 years ago

On the implementation, what should work fairly well is that instead of concatenating the weights, we could stack them in the time series dataset. A multi-target metric should then be fairly easy to implement because there should be no confusion between weights and targets anymore.

@emigre459 I think forecasting the delta x and y should work far better or even delta speed and direction. However, if you know the graph on which those vehicles operate, a graph network is likely to do a better job.

emigre459 commented 3 years ago

Thanks for the ideas @jdb78! I'll definitely play around with that idea of forecasting changes in position as well as absolute (normalized) positions. I suspect that absolute positions will be necessary because certain expected behaviors are dependent upon absolute location (e.g. taking an on-ramp to the interstate or pulling into a warehouse), but I'll do some experiments too!

Is there a multi-target dataset you can think of that would be a good test case for multi-target feature development? I'd like something akin to MNIST that I can reasonably expect a good training run on if the implementation is sound.

jdb78 commented 3 years ago

I guess you could try something in the recommendation space (what product brought when)? But I am not aware of a standard dataset. Timeseries forecasting is extremely heterogeneous.

Amit12690 commented 3 years ago

@emigre459 I am working on a problem very similar to yours. I am trying to forecast the x and y positions of an object .

The input features are [x_position, y_position , velocity , acceleration ]. Did you manage to solve the multi-target forecasting problem ?

emigre459 commented 3 years ago

Sorry @Amit12690 , I had to move on from this problem when I switched jobs and haven't come back to it. That said, it looks like PR #199 addressed this. Have you taken a look at that? Does it not work for you for some reason?