Open HanjoStudy opened 6 years ago
Can you try this without parallel processing and with a small, reproducible example that I can test with?
Can you also send the results of sessionInfo()
?
Updated as requested. A small, reproducible example is available as a gist here
@HanjoStudy: Forget parallel
altogether. The fit
function is never called.
The obvious fix for that is to redefine it as: ordinalForestFit <- function(x, y, param, lev, last, classProbs, ...) { ... }
but then you have other issues as the predict function is also incomplete.
If you want, I can make a PR about ordinalForest
during the weekend.
Suggestion for the future: Try and follow the style of other model files. It will save you time and trouble. I am not suggesting that your code structure is worse or the current style is better. Nevertheless, the current style is the existing one, is prevalent through the whole code base and it makes it easy to compare files.
Thanks @hadjipantelis for the advise! I do agree with you, I think next time I would rather start with a caret model skeleton and work it back from there. No rush in implementing the model, I am just slowly trying to wrap my head around the caret framework's underlying components and this ordinalForest
seemed like a good example to try it out on since it hasn't been implemented.
Ill also keep playing with the code a bit over the next week or so
@hadjipantelis is right that the fit
module should have the right arguments.
Another issue is the predict (and likely the prob
) module. This package follows the crappy precedent set by ranger
that doesn't just return the predictions. You'll need to use the ypred
component:
> ?ordfor
> data(hearth)
>
> set.seed(123)
> trainind <- sort(sample(1:nrow(hearth), size=floor(nrow(hearth)*(1/2))))
> testind <- sort(sample(setdiff(1:nrow(hearth), trainind), size=20))
>
> datatrain <- hearth[trainind,]
> datatest <- hearth[testind,]
>
> ordforres <- ordfor(depvar="Class", data=datatrain, nsets=60, nbest=5)
>
> preds <- predict(ordforres, newdata=datatest)
>
> preds
Predicted values of 20 observations.
Classes of ordinal target variable:
"1", "2", "3", "4", "5"
> names(preds)
[1] "ypred" "classfreqtree"
> preds$ypred
[1] 1 1 1 1 1 4 1 1 1 4 1 1 1 5 1 1 4 4 4 4
Levels: 1 2 3 4 5
@topepo Yes, I fully agree (that's why I said it is "incomplete"). OP seems busy so I will drop the PR in the next day or two.
@topepo Some quick questions so we save ourselves back and forth after the PR.
polr
), not "Regression", because the response variable is expected to be a factor. Are we OK with that?ordfor
does not use sample weights but rather class weights. Weights themselves are used when a custom performance function is used (?ordinalForest::perff
for details). The user can theoretically still pass them through the ...
if inclined but I am not testing this or including this functionality out-of-the-box as it non-standard (usually weights are an nrow(x)
long vector).LOOCV
does not work. This is by design as the single row newdata
wll err. e.g.
library(ordinalForest)
data(hearth)
datatrain <- hearth[-1,]
datatest <- hearth[1,]
ordforres <- ordfor(depvar="Class", data=datatrain, nsets=60, nbest=5) predict( ordforres, datatest)
I can e-mail the maintainer about this but this is a separate issue.
Aside these the PR is ready on my fork. :)
Hello. shouldnt be set hyperparameters in ordinal forests : nbest = 50 and nset = 10 is not enough and ordinal forest will be suboptimal
Needing help with
I have recently come across the orindalForest package. This package focuses on an implementation of the well known ranger
Random Forest
model. With this in mind, I am trying to integrate the model into the caret framework without any luck. Here is what I have so far:The code
Package can be installed from
CRAN
Now that the initial code is written, I can define the default training grid. Some models can do a random search, but I wont implement that as the [paper] (https://epub.ub.uni-muenchen.de/41183/1/TR.pdf) states the defaults are pretty good.
For the fitting function, the
ordfor
function does have a strange implementation in definingX
andY
. I try and overcome this by explicitly asking forX
andY
, binding them and then implementing theordfit
function:I also notice that the classes are stored in a strange slot as well:
Next I add the prediction functions:
Laslty I add the sorting, how the tuning parameters are ordered in case similar performance obtained
The test
Now that the hard work is done, lets get the party started
With great dissapointed, we get the usual fallback error:
There were missing values in resampled performance measures
Basic Implementation
To test the basic implementation, we can use the following code:
Session info