pfmc-assessments / VASTWestCoast

VAST for the NWFSC West Coast data
Other
5 stars 1 forks source link

base::poly #4

Open kellijohnson-NOAA opened 6 years ago

kellijohnson-NOAA commented 6 years ago

@jkbest2 suggested using the base::poly function in R to create quadratic terms rather than squaring the data. The function allows for orthogonal something or other, which basically means that as you add more terms the coefficients for the previously modelled terms shouldn't change. For example, with a linear model that has a linear term of 1.5 the term wouldn't change if you added a coefficient for the squared variable. This will need to be implemented as a switch so users can have

ericward-noaa commented 6 years ago

I don't know whether it's worth including this as a switch, but another option is to include centered covariates. Internally, a covariate X would be turned to centered_x = X - mean(X), and centered_x + centered_x^2 are the new predictors. At least in the Stan implementations I've worked on, this can improve things (https://github.com/seananderson/glmmfields)

James-Thorson commented 6 years ago

yeah, centering is very important in VAST when it has to expontentiate the linear predictor and can end up with numerical under/overflow without centered covariates.

I always wanted to make a function expand_covariates that would take the output from format_covariates (which takes point-measurements of covariates and transforms it the proper 3D array for VAST) and then do quadratic or even spline basis-expansions. If these worked well enough after testing, maybe they could then be pushed into Data_Fn.

Anyhoo, feel free to workshop some ideas for expand_covariates or something similar...?

On Fri, Sep 14, 2018 at 8:37 AM Eric Ward notifications@github.com wrote:

I don't know whether it's worth including this as a switch, but another option is to include centered covariates. Internally, a covariate X would be turned to centered_x = X - mean(X), and centered_x + centered_x^2 are the new predictors. At least in the Stan implementations I've worked on, this can improve things (https://github.com/seananderson/glmmfields)

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/nwfsc-assess/VASTWestCoast/issues/4#issuecomment-421397723, or mute the thread https://github.com/notifications/unsubscribe-auth/AHnqTRurh2tHf0H5AG8P0bVgmh084gn6ks5ua80jgaJpZM4Wn-AC .

kellijohnson-NOAA commented 6 years ago

What is the benefit of centering versus centering and dividing by the standard deviation or should they be fairly equivalent? @John-R-Wallace has been doing some investigations of using raster data rather than point-process covariate data, it is unclear to me if it is better to get the covariate at the knot location of something that has undergone prior interpolation or if it is better to do what is currently being done, which is the average of the points within the triangle?