rietho / IPO

A Tool for automated Optimization of XCMS Parameters
http://bioconductor.org/packages/IPO/
Other
34 stars 20 forks source link

xcmsSetStatistic() may produce model with suboptimal results #61

Open rickhelmus opened 6 years ago

rickhelmus commented 6 years ago

Hello,

Recently I started delving in the (very interesting!) IPO package. By pure coincidence I noticed that during a test run the final results were not optimal.

Reproducible example:

# test dataset
devtools::install_github("rickhelmus/patRoonData")

anaList <- list.files(patRoonData::exampleDataPath(), pattern = "\\.mzML", full.names = TRUE)

ppParams <- IPO::getDefaultXcmsSetStartingParams("centWave")
ppParams$min_peakwidth <- c(4, 12)
ppParams$ppm <- c(3, 10)
ppParams$method <- "centWave"

iOpt <- IPO::optimizeXcmsSet(anaList[4:5], ppParams, nSlaves = 1)

The experimental results and plots of the fourth (and final) experiment look promising:

> iOpt[[4]]$response
      exp num_peaks notLLOQP num_C13      PPS
 [1,]   1       543      288     118 48.34722
 [2,]   2       170       65      46 32.55385
 [3,]   3       573      314     118 44.34395
 [4,]   4       208       80      60 45.00000
 [5,]   5       568      306     122 48.64052
 [6,]   6       186       74      46 28.59459
 [7,]   7       596      320     121 45.75312
 [8,]   8       228       93      64 44.04301
 [9,]   9       543      288     118 48.34722
[10,]  10       170       65      46 32.55385
[11,]  11       573      314     118 44.34395
[12,]  12       208       80      60 45.00000
[13,]  13       567      306     122 48.64052
[14,]  14       186       74      46 28.59459
[15,]  15       595      321     119 44.11526
[16,]  16       228       93      64 44.04301
[17,]  17       266       75      80 85.33333
[18,]  18       572      295     125 52.96610
[19,]  19       195       75      52 36.05333
[20,]  20       235       70      76 82.51429
[21,]  21       365      153      98 62.77124
[22,]  22       258       69      82 97.44928
[23,]  23       269       80      84 88.20000
[24,]  24       266       75      80 85.33333
[25,]  25       266       75      80 85.33333
[26,]  26       266       75      80 85.33333

rsm_4

However, the final result calculated by the model has a much lower score:

> max(iOpt[[4]]$response[, 5])
[1] 97.44928

> iOpt[[4]]$PPS
    ExpId    #peaks    #NonRP       #RP       PPS 
  0.00000 322.00000 124.00000  88.00000  62.45161

I suspect the final combination of parameters results in a corner case where XCMS suddenly yields very different results than what the model could predict. However I'm just brushing up my DoE knowledge so any ideas here would be welcome!

In this case the final result is lower than the third experiment (PPS: 85.3), hence, resultIncreased() returns FALSE. Interestingly, since the max_settings are used to find the 'best' experimental iteration and are calculated by the model (i.e. instead of the actual result), the last experiment is still taken as optimum result.

Anyway, I noticed that IPO is (unfortunately) not anymore actively maintained. Still I hope to bring up some discussion what could be a solution to this. A simple method might be to actually check if the response from the model parameters is actually best and when it's not, take the best conditions from the experiments that led to the model. What do you think?

rietho commented 6 years ago

Hi @rickhelmus, thanks a lot for your comment and for starting this discussion.

The point you brought up is a very good one. I did notice this behaviour before, but never had the chance to implement this enhancement. To enhance this, the function optimizeXcmsSet would need to be adjusted.

You are right, that IPO is not any more actively maintained at the moment. There is still some discussion how IPO might be further developed in the future, but unfortunately I won't be able to do so in the near future. But maybe this is a case to get the ball rolling.

rickhelmus commented 6 years ago

Thanks for starting the discussion (and sorry for my belated reply).

I've implemented the simple change where it switches to parameters from an experiment with better response when this situation happens (with some user defined allow deviation). This seems to improve things, at least.

I also tried to add the any sub-optimal results to the model in the hope to improve prediction. This, however, seem to only make things worse.