mlr-org / mlr

Machine Learning in R
https://mlr.mlr-org.com
Other
1.64k stars 405 forks source link

Strongly suggest Refactoring our featureExtraction code #1439

Closed smilesun closed 6 years ago

smilesun commented 7 years ago

@berndbischl Currently in "TSfeatures.R" We are using the following function to convert a TimeseriesClassif Task into a normal task, the problem is we're passing a String called "method" as an argument which lets the user choose which method they want to use with the " switch" statement.

convertTSTaskToNormalTask = function(task, method, pars = NULL) {
  # check if task
  assertClass(task, classes = "Task")

  #check for Time Series Classif Task
  if (!any(class(task) == "TimeSeriesClassifTask"))
    stop("Task is not a 'TimeSeriesClassifTask'. Please check task.")
  #check valid feature extraction method
  if (!(method %in%  c("wavelets", "fourier", "shapelets")))
    stop("Method for feature extraction must be one of 'wavelets' or 'fourier' or 'shapelets'. Please check method.")
  # #check for valid pars
  # if (!(all(names(pars) %in% c("filter", "boundary", "fft.coeff"))))
  #   stop("Pars includes non valid arguments. Must be filter or boundary (wavelets) or fft.coeff (fourier).")

  target = task$task.desc$target
  z = getTaskData(task, target.extra = TRUE, recode.target = "-1+1")

  switch(method,
         wavelets = {tsf = getTSWaveletFeatures(data = z$data, target = target, include.target = FALSE, filter = pars$filter, boundary = pars$boundary)},
         fourier = {tsf = getTSFourierFeatures(data = z$data, target = target, include.target = FALSE, fft.coeff = pars$fft.coeff)}
         )

  if (method == "shapelets") {
    #FIXME: how to correctly use pars here for additional parameters?
    modelSh = getTSShapeletFeatures(curves = z$data, label.train = z$target )
    return(modelSh)
  }

  newdata = cbind(as.factor(z$target), tsf)
  colnames(newdata)[1] <- target  # rename target column
  newtask = makeClassifTask(data = newdata, target = target, positive = task$task.desc$positive)
  return(newtask)
}

From my side, this is against the basic OO principle and create high dependencies. To get the inversion of control we could use dependency injection here. Let "convertTSTaskToNormalTask" to be an S3 method, and let method to be an Object that convert the task itself.

So parameter "method" becomes independent from other methods. We code difference "method" object to operate on the task. I would like to change the code like this:

convertTSTaskToNormalTask = function(task, method, pars = NULL) {
  # check if task
  assertClass(task, classes = "Task")
  #check for Time Series Classif Task
  if (!any(class(task) == "TimeSeriesClassifTask"))
    stop("Task is not a 'TimeSeriesClassifTask'. Please check task.")
  #check valid feature extraction method
  target = task$task.desc$target
  z = getTaskData(task, target.extra = TRUE, recode.target = "-1+1")
  featMat = do.call(what = method$fun_getFeatures, args = pars), 
  newdata = cbind(as.factor(z$target), tsf)
  colnames(newdata)[1] <- target  # rename target column
  newtask = makeClassifTask(data = newdata, target = target, positive = task$task.desc$positive)
  return(newtask)
}

# Outside , we call this function like this
task = GunpointTask;
mmethod = newFourier()
mpars = list(filter = bla, boundary = bla,  fft.coeff = bla)
convertTSTaskToNormalTask(task, mmethod, pars = mpars)

The point to discuss is that maybe we could just pass the mpars into mmethod = newFourier(mpars), which is more compact?

smilesun commented 7 years ago

@berndbischl answer requested.

berndbischl commented 7 years ago

this is not urgent, please fix ALL of the other stuff first

smilesun commented 7 years ago

look at code of imputation.R by Michell which is using the strategy pattern

lbeggel commented 7 years ago

Work in progress

smilesun commented 6 years ago

This is solved in the branch #2005