Open jllavin77 opened 2 years ago
Thanks for your interest! Please, use the predict method and supply the instances that were unlabeled during the training. That way you are using the transductive capabilities of the model because those instances were also seen during the training. Hope I helped. If you still have questions don't hesitate to ask.
My question is more related to having a function to obtain that information in table format. Using predict doesn't provide that info. You suggest to use predict on my unlabeled data, but, which model should I use for that prediction? Could you provide an code example on that? Is it somethig similar to this snipet?
`######################REDUCED CODE######################
m <- selfTraining(x = xtrain, y = ytrain, learner = knn3, learner.pars = list(k = 1))
pred <- predict(m, xitest, interval="confidence")
summary(pred) `
Once I carry out this prediction, how do I get the data I'm really looking for, because this way I end up with a summary of the predictions, but no clue about which label corresponds to each row. Do you see what I mean?
I think I understand what you are looking for. Could you please try this code? But if it is not solving your problem, please continue asking!
data(iris)
x <- iris[, -5] # instances without classes x <- as.matrix(x) y <- iris$Species
set.seed(1) tra.idx <- sample(x = length(y), size = ceiling(length(y) * 0.5)) xtrain <- x[tra.idx,] # training instances ytrain <- y[tra.idx] # classes of training instances
tra.na.idx <- sample(x = length(tra.idx), size = ceiling(length(tra.idx) * 0.7)) ytrain[tra.na.idx] <- NA # remove class information of unlabeled instances
m <- selfTraining(x = xtrain, y = ytrain, learner = knn3, learner.pars = list(k = 1))
xttest = xtrain[tra.na.idx,] pred.label <- predict(m, xttest)
xttest <- cbind(xttest, pred.label) xttest
Dear @mabelc,
Thank you very much for your piece of code. It works, and was exactly what I was asking for.
Just one more question, I have read the selfTraining function documentation and cannot figure out how to change the learner parameter from KNN3 to random forest, svm or any other classifier. Is there a list of the available classifiers explained somewhere?
Thanks in advance for your kind help.
Hi,
In this paper https://cran.r-project.org/web/packages/ssc/vignettes/ssc.pdf you can find many examples with different learners. I have modified the previous example to use SVM as learner. Basically you can use learners from R ecosystem, the generic functions provided will help you with that. In the example I am using the generic version of selfTraining, named selfTrainingG.
library('ssc') library('e1071')
data(iris)
x <- iris[, -5] # instances without classes x <- as.matrix(x) y <- iris$Species
set.seed(1) tra.idx <- sample(x = length(y), size = ceiling(length(y) * 0.5)) xtrain <- x[tra.idx,] # training instances ytrain <- y[tra.idx] # classes of training instances
tra.na.idx <- sample(x = length(tra.idx), size = ceiling(length(tra.idx) * 0.7)) ytrain[tra.na.idx] <- NA # remove class information of unlabeled instances
gen.learner <- function(indexes, cls) e1071::svm(x = xtrain[indexes, ], y = cls, type='C-classification', probability=TRUE)
gen.pred <- function(model, indexes){ p <- predict(model, xtrain[indexes, ], probability=TRUE) attr(p, "probabilities") }
m <- selfTrainingG(y = ytrain, gen.learner, gen.pred)
xttest = xtrain[tra.na.idx,] pred.label <- predict(m$model, xttest)
xttest <- cbind(xttest, pred.label) xttest
Dear developers,
I was looking for a Semi-Supervised ML method in R and found your excellent package. I tried your example code adapting it to my input data, and after some reformating it works apparently well. The problem I have is related to how to access prediction results for each of the rows in my input table. I may sound naive, but I can't find the code to access the classification assigned for each of the "unlabeled" rows in my table, by any of the methods carried out in your vignette's example code. I can access the sumary of how many samples have been assigned to each class, but I'd like to know how to access to each row's individual class/label prediction (in dataframe format, for instance). I hope I was able to explain myself clearly enough for everybody to understand this request. Thanks in advance and congrats for your nice work.