topepo / caret

caret (Classification And Regression Training) R package that contains misc functions for training and plotting classification and regression models
http://topepo.github.io/caret/index.html
1.62k stars 632 forks source link

Looking for time complexity for a few models #834

Closed KarlesP closed 6 years ago

KarlesP commented 6 years ago

I was trying to use a few models for my thesis from the caret library and I wanted to ask if I there is a way I could find the time complexity of lda, svmradial, knn, random forrests and rpart. Because there is nothing in the documentation file.

topepo commented 6 years ago

find the time complexity

Do you mean a measure of how long the model takes to train?

KarlesP commented 6 years ago

Yes, but in a theoretical sense. Because I can measure the time that it takes by using the time() function in R but that is in practice. That being said, I can't reference the reason that it takes so long for my machine to calculate my estimations using those models.

topepo commented 6 years ago

No, as far as I know there are no theoretical measurements of the training time. train has a sub-object called times that measure the execution time empirically:

> train(Species ~ ., data = iris)$times
$everything
   user  system elapsed 
  2.629   0.051   2.680 

$final
   user  system elapsed 
  0.015   0.000   0.016 

$prediction
[1] NA NA NA
KarlesP commented 6 years ago

It is my fault I wasn't specific enough. What I meant was that the documentation does not have something like this. Even though is partially useless because the library was made for practical use not as a theoretical background, still I thought I should mention that as an idea.

topepo commented 6 years ago

I don't know of anything like that that would be in software documentation.

I can't reference the reason that it takes so long for my machine to calculate my estimations using those models.

More detail on the problem and data might help solve that issue (as well as a reproducible example). It is sometimes hard to tell because, in the past, others have had somewhat unreasonable expectations on how long something should take.

KarlesP commented 6 years ago

Ok, I did some research and I found what I was looking for. Turns out I was talking about the Big-O of the models. Which translates to the following question; I want to find the time complexity and the space complexity of a model, how can I do that?

You can find Big-O in the 3.2 paragraph of that paper. Now if you don't want to download it, for security reasons the paper is called "Time Complexity Analysis of Support Vector Machines (SVM) in LibSVM" by Abdiansah A. and Wardoyo R. (2015) big-o.pdf

I hope I don't ask for much. Even though I believe it would be a nice function for your library.

topepo commented 6 years ago

I see.

I hope I don't ask for much.

Don't ever worry about asking for something =]

Even though I believe it would be a nice function for your library.

The fasted way to make that happen is to send in a pull request with some code.

timcdlucas commented 6 years ago

I have been hoping to find a similar thing. Table 10.1 here has a very rough Good/Bad. Apparently there is a similar table with big-O somewhere, but I can't find it.

I keep on meaning to just make some datasets (one long, one wide, one small?) and run all caret models on them then put the table in a blog post. But I never get around to it.

timcdlucas commented 6 years ago

And just to add this https://stats.stackexchange.com/questions/270686/supervised-machine-learning-classifiers-big-o

For models that use optimisation to find parameters, big O isn't very useful.