Open GFabien opened 2 years ago
As far as I know, caret's train()-function performs by default a 5 (?) Fold Cross Validation while training the model. The lm()-function of course doesn't, that's why the latter is much faster.
It seems that caret::train
function calls stats::lm
only once. I wonder if this additional time is due to all the checks performed. I will try to dive deeper into this problem.
I looked into this problem and found two reasons responsible for this performance issue:
caret::train
function is the call to system.time
. Changing this by two calls to proc.time
divides the computation time by 10.getModelInfo
. Hence, if the model is called a high number of times, this will cause some overhead. To take only account this once, getModelInfo
can be called outside of the function, and the method
argument can be directly filled with the result of getModelInfo
. Doing that leads to an extra factor of 10 in the computation time for my simple example.I would be glad to make a PR to fix the first point.
Hey! Thanks for the great package! I am using caret to be able to use a wide range of models directly, and this is really easy, thanks to caret. However, I realized that fitting a Linear Regression using caret's
train
function was slower than fittingstats::lm
directly (see the benchmark reported below). Is there anything I'm missing in how I am doing the training using caret? Here I don't want to tune any parameter nor perform any splitting of my data.Thank you for your help!
Minimal, runnable code:
Session Info: