nanxstats / Rcpi

💊 Molecular informatics toolkit with integration of bioinformatics and cheminformatics tools for drug discovery
https://nanx.me/Rcpi/
Artistic License 2.0
35 stars 12 forks source link

Error when training three classification models #12

Closed Arek86 closed 3 years ago

Arek86 commented 3 years ago

Hi, I'm trying to run the whole example script. Unfortunately, I stacked on the step where we train three classification models After I run the command:

svm.fit1 <- train( x1.tr, y.tr, method = "svmRadial", trControl = ctrl, metric = "ROC", preProc = c("center", "scale") )

I get this error message: Error: Please use column names for x

I'm quite new in programming and I don't know how to resolve this problem. Can you help me? I will be grateful.

Best regards, Arek

nanxstats commented 3 years ago

So apparently things have changed a bit in caret after these years - after assigning names to the columns, it works:

library("Rcpi")
library("caret")
library("kernlab")

fdamdd.smi <- system.file("vignettedata/FDAMDD.smi", package = "Rcpi")
fdamdd.csv <- system.file("vignettedata/FDAMDD.csv", package = "Rcpi")
x.mol <- readMolFromSmi(fdamdd.smi, type = "mol")
x.smi <- readMolFromSmi(fdamdd.smi, type = "text")
y <- as.factor(paste0("class", scan(fdamdd.csv)))

x1 <- extractDrugEstateComplete(x.mol)
x1 <- x1[, -nearZeroVar(x1)]

colnames(x1) <- paste0("x", 1:ncol(x1))

set.seed(1003)
tr.idx <- sample(1:nrow(x1), round(nrow(x1) * 0.75))
te.idx <- setdiff(1:nrow(x1), tr.idx)
x1.tr <- x1[tr.idx, ]
x1.te <- x1[te.idx, ]
y.tr <- y[tr.idx]
y.te <- y[te.idx]

ctrl <- trainControl(
  method = "repeatedcv", number = 5, repeats = 10,
  classProbs = TRUE,
  summaryFunction = twoClassSummary
)

svm.fit1 <- train(
  x1.tr, y.tr,
  method = "svmRadial", trControl = ctrl,
  metric = "ROC", preProc = c("center", "scale")
)
Arek86 commented 3 years ago

Thank you very much. It works now!