johnnymdoubleu / lassoSSNAL

Semismooth Newton Augmented Langrangian Method implemented in R
GNU General Public License v3.0
1 stars 0 forks source link

read.libsvm() #10

Open johnnymdoubleu opened 2 years ago

johnnymdoubleu commented 2 years ago
#filename parameter should be just name not the path eg) data.txt
read.libsvm <- function(filename) {
  content <- readLines(filename)
  str1 <- sub('\\s+.*', '', content)
  #match non space characters from the beginning (`^[^ ]+`) followed by space
  #replace with `''` to extract the characters that follow after the space.
  str2 <- sub('^[^ ]+\\s+', '', content)
  num_lines <- length(content)
  # tomakemat <- cbind(1:num_lines, 0,substr(content,1,4))
  tomakemat <- cbind(1:num_lines, 0, as.numeric(gsub(" .*$", "", content)))
  # loop over lines
  makemat <- rbind(tomakemat,
                   do.call(rbind, lapply(1:num_lines, function(i){
                     # split by spaces, remove lines
                     line = as.vector(strsplit(content[i], ' ' )[[1]])
                     cbind(i, t(simplify2array(strsplit(line[-1],':'))))   
                   })))
  class(makemat) <- "numeric"
  sparseMat <- sparseMatrix(i = makemat[,1], j = makemat[,2]+1, x = makemat[,3])
  write.csv(as.matrix(sparseMat)[,-1], glue("{strsplit(filename, '.txt')}.csv"),row.names = FALSE)
  return(sparseMat)
}
johnnymdoubleu commented 2 years ago

This code takes long running time as it is on R and not on C. Using this code from LIBSVM library