How do we want FDA Feature Extraction to behave.

pfistfl commented 7 years ago

I will do a quick writeup of what I have currently done, would be glad about some input.

FDATasks are Tasks that contain additional Description elements fd.features and fd.grids. The fd.features describe which columns belong to which functional covariate, the fd.grids specify the measurement grid for each functional covariate.

I implemented some rather stupid functions for Feature Extractions, such as min, max, mean, etc., along with the possibility to extract Fourier Transform Features.

The current call looks like this: (UVVIS, NIR are the functional covariates, each corresponding to a few hundred columns in the same data.frame associated with the task.)

# On a task
  methods = list("UVVIS" = extractFDAMean(), "NIR" = extractFDAMinMax())
  t = extractFDAFeatures(fuelsubset.task, feat.methods = methods)
# On a data.frame
  methods = list("UVVIS" = extractFDAMean(), "NIR" = extractFDAFourier())
  df = getTaskData(fuelsubset.task)
  feats = fuelsubset.task$task.desc$fd.features
  grds = fuelsubset.task$task.desc$fd.grids
  t2 = extractFDAFeatures(df, feat.methods = methods, fd.features = feats, fd.grids = grids)

Questions:

What should be returned for case task? Currently t contains a normal task. All functional features are dropped, all scalar features are kept as-is. The extracted features are appended (with propper naming i.e UVVIS.mean, NIR.min, NIR.max, ...)
```
> t
$task
Supervised task: mdata
Type: regr
Target: heatan
Observations: 129
Features:
numerics  factors  ordered 
   4        0        0 
Missings: FALSE
Has weights: FALSE
Has blocking: FALSE
```

$desc Extraction of features from functional data: Target: heatan Functional Features: 2; Extracted features: 2

getTaskFeatureNames(t$task) [1] "UVVIS.mean" "NIR.min" "NIR.max" "h2o"

reExtract (the analog to reimpute) works analogous to reimpute, either on a task or a data.frame.

methods = list("UVVIS" = extractFDAMean(), "NIR" = extractFDAMedian())
lrn = makeExtractFDAFeatsWrapper("regr.rpart", feat.methods = methods,
fd.features = fuelsubset.task$task.desc$fd.features,
fd.grids = fuelsubset.task$task.desc$fd.grids)
mod = train(lrn, fuelsubset.task)
prd = predict(mod, fuelsubset.task)

Can we extend the PreprocWrapper somehow, so it works on tasks? For now I have to specify features and grids for every learner which is rather cumbersome. (I might be overlooking something as well...)

> mod
Model for learner.id=regr.rpart.extracted; learner.class=extractFDAFeatsWrapper
Trained on: task.id = mdata; obs = 129; features = 366
Hyperparameters: xval=0

# The model predicts on the transformed values
> mod$learner.model$next.model$learner.model$frame
          var   n  wt       dev     yval  complexity ncompete nsurrogate
1  UVVIS.mean 129 129 4301.6944 24.76577 0.266282267        2          1
2      <leaf>   7   7  284.9134 12.32559 0.010000000        0          0
3  UVVIS.mean 122 122 2871.3160 25.47955 0.125342286        2          1
6         h2o  39  39 1387.1449 22.41268 0.043019220        2          1
12        h2o  30  30  770.1587 21.31361 0.043019220        2          1
24     <leaf>  11  11  186.0255 17.81104 0.010000000        0          0
25     <leaf>  19  19  371.0566 23.34142 0.010000000        0          0
13     <leaf>   9   9  459.9517 26.07626 0.010000000        0          0
7  NIR.median  83  83  944.9869 26.92061 0.040166876        2          2
14        h2o  27  27  311.5995 24.84270 0.017182453        2          0
28     <leaf>   7   7  162.5681 22.04600 0.010000000        0          0
29     <leaf>  20  20   75.1177 25.82155 0.002196515        0          0
15        h2o  56  56  460.6018 27.92246 0.016309022        2          1
30     <leaf>  19  19  167.0618 26.36053 0.010000000        0          0
31     <leaf>  37  37  223.3836 28.72454 0.003885285        0          0

Currently only one feature Extraction method per functional covariate is allowed. I think somehow combining featureExtractions might become handy. Should this be done by allowing multiple feat.methodsper variable?
Possibly rename extractFDAFeaturesFourier and extractFDAFeaturesWavelets to extractFourierFeatures and extractWaveletFeatures. Have the Original names for the feature extraction methods on FDA Data described above and the new names for a more general method that can be somehow used on time series data as well.

All those changes are in the fda_pull1_task_featExtract branch @smilesun @lbeggel

smilesun commented 7 years ago

"What should be returned for case task? Currently t contains a normal task. " I think we return a task which is the best option if the input is task.
"Can we extend the PreprocWrapper somehow, so it works on tasks?" Bernd and I will have a look on this on Friday

lbeggel commented 7 years ago

"renaming": sounds reasonable, too

lbeggel commented 7 years ago

Side remark: the example in extractFDAfeatures.R does not work

pfistfl commented 7 years ago

@lbeggel: Fixed Example

@smilesun Perfect! I stumbled upon one more thing. In convertTaskToFDATask we check for the correct specification of fd.features and fd.grids. If we want to be able to do the extraction of data.frame's as needed for the wrapper, I have to check all those things during extraction aswell. Alternative1: Only allow extraction for tasks, this ensures correct specification of fd.grids and fd.features Alternative2: Create helper which does the checking and use it for data.frames.

Additionally, discuss function names. I am not super content with how they turned out. I will try to have everything ready and documented by friday.

smilesun commented 7 years ago

@pfistfl Alternative3 : if the user pass a data frame, it is by default to be a single channel of features !

pfistfl commented 7 years ago

In regards to Alternative3: As we would do with the task. This means again, just copy-pasting the code from convertTaskToFDATask.

smilesun commented 7 years ago

@pfistfl I don't get what you mean.

pfistfl commented 7 years ago

I think in case the user supplies a data.frame we have to give the opportunity to specify fd.features and fd.grids. This means we have to check whether these provided parameters are valid. Passing nothing and assuming its a single functional covariate is a good default.

pfistfl commented 7 years ago

Additionally, @Stevo15025 wanted to use the extractFourierFeatures for time series data. We should discuss a suitable format for this. Is the current format suitable for this?

lbeggel commented 7 years ago

Yes, it should be. time series = funcitonal covariate that is one line.

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

mlr-org / mlr

How do we want FDA Feature Extraction to behave. #1769

Questions: