sneumann / xcms

This is the git repository matching the Bioconductor package xcms: LC/MS and GC/MS Data Analysis
Other
177 stars 81 forks source link

Definition of anchor peaks for subset-based peakGroups alignment not correct #702

Closed jorainer closed 5 months ago

jorainer commented 7 months ago

The current implementation of the subset-based peakGroups alignment to define the anchor peaks currently does not correctly select the peaks based on the minFraction parameter: minFraction is referring to the total number of samples and not the total number of samples of the subset. Example:

library(xcms)
faahko_3_files <- c(system.file('cdf/KO/ko15.CDF', package = "faahKO"),
                    system.file('cdf/KO/ko16.CDF', package = "faahKO"),
                    system.file('cdf/KO/ko18.CDF', package = "faahKO"))

faahko_od <- readMSData(faahko_3_files, mode = "onDisk")
xod_x <- findChromPeaks(
    faahko_od, param = CentWaveParam(noise = 10000, snthresh = 40,
                                     prefilter = c(3, 10000)))
pdp <- PeakDensityParam(sampleGroups = rep(1, 3))
xod_xg <- groupChromPeaks(xod_x, param = pdp)

Using the first and last sample for subset and setting minFraction = 0.6 we would expect to get anchor peaks only if a peak was identified in both samples, but

p <- PeakGroupsParam(subset = c(1, 3), minFraction = 0.6)
adjustRtimePeakGroups(xod_xg, p)
     ko15.CDF ko18.CDF
FT21 2679.783       NA
FT15 2678.218 2684.479
FT16 2679.783 2682.914
FT20 2679.783 2684.479
FT24 2681.348 2684.479
FT08 2783.070       NA
FT04 2786.200 2787.766
FT46 2786.200 2787.766
FT01 2787.765 2787.766
FT45       NA 2916.092
FT06 2923.916 2923.917
FT05 2994.338 2994.339
FT02       NA 3250.992
FT27 3276.030 3285.421
FT31 3316.719 3340.194
FT41       NA 3349.584
FT03 3379.317 3387.143
FT32 3384.012 3410.617
FT33       NA 3410.617
FT11 3413.746       NA
FT10 3415.311 3416.877
FT28 3454.435       NA
FT42 3476.344 3482.605
FT17 3491.994       NA
FT09 3496.688       NA
FT07 3496.688 3501.384
FT35 3512.338 3523.294
FT34 3515.468 3524.859
FT43 3576.501 3584.327
FT30 3590.586       NA
FT18 3610.930 3614.061
FT12 3618.755 3626.581
FT13 3618.755 3626.581
FT14 3657.879 3664.140
FT39 3662.574 3704.828
FT40 3662.574 3706.393
FT29 3682.918 3687.614
FT44 3703.262 3712.653
FT47       NA 3709.523
FT36       NA 3767.426
FT19 3819.069 3830.025
FT26       NA 3870.713
FT25 4047.552 4052.248
FT23 4099.196       NA

Thus, anchor peaks are defined even if there are no peaks detected in one of the two subset sample (NA reported).

jorainer commented 7 months ago

This can be nailed down to the internal .getPeakGroupsRtMatrix function:

res <- xcms:::.getPeakGroupsRtMatrix(chromPeaks(xod_xg),
                                     featureDefinitions(xod_xg)$peakidx,
                                     sampleIndex = c(1, 3),
                                     missingSample = 0,
                                     extraPeaks = 1L)
res
         CP011    CP199
 [1,] 2679.783       NA
 [2,] 2678.218 2684.479
 [3,] 2679.783 2682.914
 [4,] 2679.783 2684.479
 [5,] 2681.348 2684.479
 [6,] 2783.070       NA
 [7,] 2786.200 2787.766
 [8,] 2786.200 2787.766
 [9,] 2787.765 2787.766
[10,]       NA 2916.092
[11,] 2923.916 2923.917
[12,] 2994.338 2994.339
[13,]       NA 3250.992
[14,] 3276.030 3285.421
[15,] 3316.719 3340.194
[16,]       NA 3349.584
[17,] 3379.317 3387.143
[18,] 3384.012 3410.617
[19,]       NA 3410.617
[20,] 3413.746       NA
[21,] 3415.311 3416.877
[22,] 3454.435       NA
[23,] 3476.344 3482.605
[24,] 3491.994       NA
[25,] 3496.688       NA
[26,] 3496.688 3501.384
[27,] 3512.338 3523.294
[28,] 3515.468 3524.859
[29,] 3576.501 3584.327
[30,] 3590.586       NA
[31,] 3610.930 3614.061
[32,] 3618.755 3626.581
[33,] 3618.755 3626.581
[34,] 3657.879 3664.140
[35,] 3662.574 3704.828
[36,] 3662.574 3706.393
[37,] 3682.918 3687.614
[38,] 3703.262 3712.653
[39,]       NA 3709.523
[40,]       NA 3767.426
[41,] 3819.069 3830.025
[42,]       NA 3870.713
[43,] 4047.552 4052.248
[44,] 4099.196       NA

So, that function needs to be fixed to return only non-NA values for this setting.