Open ghost opened 5 years ago
The problem is that cpo.retrafo
must consider the validation data during resampling. The CPO is run two times: for training data (cpo.trafo
runs) and for the validation data (cpo.retrafo
). What your implementation is doing is simply returning the transformed training data during the "retrafo" phase (completely ignoring the incoming validation data). The CPO framework notices that the number of rows does not match, but the problem sits deeper: There is no straightforward way for getting a corresponding prediction representation of a t-SNE transformation. t-SNE seems to be not well suited for preprocessing as part of a machine learning pipeline, because it is nonparametric and the model, once trained on transformed training data, would not be able to handle prediction data.
cpoTsne = makeCPOExtendedTrafo("t-sne", # nolint pSS(rank: integer[1, ]), dataformat = "numeric", cpo.trafo = function(data, target, rank) { outTsne= Rtsne(as.matrix(data), dims = rank, perplexity = 10, max_iter = 100) control = outTsne$Y }, cpo.retrafo = function(data, control, rank) { control }) lrn = cpoTsne(rank=2)%>>%makeLearner("classif.ksvm") resample(lrn, task, resampling = outer_loop, measures = list(mmce), show.info = FALSE)
Hello, if I execute the code above, I get the following error message: Error in recombineLL(df, newdata, targetcols, strict.factors, subset.index, : Number of rows of numeric data returned by t-sne did not match input CPO must not change row number.
This error message only appears if I use makeLearner() and resample(). But the following comand works: data%>>%cpoTsne(rank=2)
classif V1 V2 1 0 -2.055749824 -1.801610596 2 1 -0.469646936 3.347391844 3 1 -0.194586726 0.057422613 4 0 1.070363088 3.380600350 5 1 -0.567965508 3.096630889 ... The number of rows are the same as in data. Where is the problem? Must be in cpo.retrafo. Thanks