khliland / pls

The pls R package
36 stars 3 forks source link

Types of model calls #14

Closed wordsmith189 closed 5 years ago

wordsmith189 commented 5 years ago

In your JStatSoft article about pls, you show model calls like this one

model <- plsr(octane ~ NIR, ncomp = 10, data = gasTrain, validation = "LOO")

where the dependent variable octane seems to be explained by one of the independent variables, NIR. However, some of the online tutorials for pls that are available don't state an independent - they would simply write

(octane ~ ., data=gasTrain, ...)

Can you explain what the difference is, or rather: what the effect of making one of the independents explicit in the model call is?

jimmyamp commented 5 years ago

Hi, wordsmith189,

I guess there are only 2 variables (octane, NIR) in the dataset, gasoline or gasTrain.

> str(gasoline) 'data.frame': 60 obs. of 2 variables: $ octane: num 85.3 85.2 88.5 83.4 87.9 ... $ NIR : 'AsIs' num [1:60, 1:401] -0.0502 -0.0442 -0.0469 -0.0467 -0.0509 ... ..- attr(*, "dimnames")=List of 2 .. ..$ : chr "1" "2" "3" "4" ... .. ..$ : chr "900 nm" "902 nm" "904 nm" "906 nm" ... >

There may be no difference between

(octane ~ NIR, data=gasTrain, ...)

and

(octane ~ ., data=gasTrain, ...)

I may be wrong. Please correct if my guess is incorrect,

Thanks in advanced.

bhmevik commented 5 years ago

You are absolutely right. gasTrain is simply the first 50 rows of the gasoline data set, and that only has two variables, octane and NIR. In R, formulas like something ~ . is a shortcut for something ~ all + the + other + variables, so in this case, octane ~ . is equivalent to octane ~ NIR.

wordsmith189 commented 5 years ago

Got it. Thank you both!