hafen / stlplus

Seasonal-Trend Decomposition using Loess (STL) in R
Other
62 stars 12 forks source link

Interest in Implementing Cross Validation? #8

Open M-Harrington opened 5 years ago

M-Harrington commented 5 years ago

Hi, is there any interest in implementing cross validation for estimating missing values in the package? I have a problem where I had a time steps missing at random throughout my time series, and because of the number of time series I have to estimate the NA's for, it seemed to make the most sense to automatically perform parameter selection through cross validation.

Anyways long story short I've written up methods for K-fold and Monte Carlo cross validation to work with STLplus and I've also written a function to perform a grid search not so different than in the Scikit learn in python. If you're interested I can paste the code here, or if it's not really appropriate for the package itself I can submit my code as an answer to a self-made question on Stack Exchange.

Best, Matt

PS: the function hasn't been super optimized yet or user-proofed, but I figured I'd include that iff it would be used by general people.

M-Harrington commented 4 years ago

I implemented a solution and have it hosted on my github under the name STLinterp . https://github.com/m-harrington/stlinterp

hafen commented 4 years ago

This is great! Thanks for sharing. I think it would be great as part of the package. To add it to the package, I think a few additional things should be put in place, such as validating the grid argument, etc. If you are interested in polishing it up as an exported function of the package, I'd happily accept a PR and add you as a contributor.

M-Harrington commented 4 years ago

Hi @hafen, glad you got back to me! I'm more than happy to clean up the function and provide a bit more functionality. I'll make some first passes at cleaning it up, but if you have any guidance after that, it'd be great because I haven't really done any proper R development work so I'm bound to make mistake.

Thanks and I'll let you know when I've done some of those first order corrections!

M-Harrington commented 4 years ago

Hi @hafen , I added the option to return either the best parameter set of the grid or the entire grid and their scores. I also tweaked the monte carlo method to be a little easier to use. The biggest change mostly was wrapping STLplus in a tryCatch because I noticed some parameter sets could be a bit finicky. Let me know what you think and if you have any other requested changes!

M-Harrington commented 3 years ago

@hafen the function is pretty much ready, I can make the pull request whenever. Also I've written a brief tutorial on my website to explain how to use it and how to use STLplus to estimate missing values.

https://www.mattrharrington.com/post/fill-in-missing-cyclical-data-using-seasonal-trend-loess-and-cross-validation

Also do you have any advice for preparing documentation?

hafen commented 3 years ago

@M-Harrington great! To be ready to drop in, can you document the functions in your script as described here: https://r-pkgs.org/man.html. Specifically, section 10.4 should be useful. Basically if you can give each function a description, document the parameters, and provide examples where necessary, that would be great.

M-Harrington commented 3 years ago

@hafen Ok I've added the documentation to the best of my ability for the main function and everything looks mostly in order on my end. I wasn't quite sure about what to do with the subroutines that weren't really meant to be called so I've left them mostly undocumented, but I'm happy to change that if you'd like. You're welcome to check out the changes in STLinterp.R on my github or you can just add me as a collaborator and I'll start the pull request.

I did have one quick question though just to make sure before I submit this, how is the fc component factored into the prediction? Previously I was assuming that everything was captured in the seasonal and trend components, but should I be adding or averaging the fc component as well. I.e. not doing reconstruction <- seasonal(stlobj)+ trend(stlobj) and instead something like reconstruction <- seasonal(stlobj)+ trend(stlobj)+fc(stlobj)?