sneumann / CAMERA

This is the git repository matching the Bioconductor package CAMERA: Collection of annotation related methods for mass spectrometry data
11 stars 21 forks source link

From XCMS3 - NA values in xcmsSet. Use fillPeaks() #33

Open lecorguille opened 6 years ago

lecorguille commented 6 years ago

Hi,

It seems that CAMERA doesn't really like the NAs left by fillChromPeaks:

Error in .local(object, ...) : NA values in xcmsSet. Use fillPeaks()
Calls: annotatediff -> diffreport -> diffreport -> .local

Although, the data were processed by FillChromPeaksParam

> processHistory(xdata)
[...]
[[8]]
Object of class "XProcessHistory"
 type: Missing peak filling
 date: Wed Jul 11 11:28:54 2018
 info:
 fileIndex: 1,2,3,4
 Parameter class: FillChromPeaksParam

Note that I'm using a toy example for my functional tests

> apply(featureValues(xdata, filled = FALSE), MARGIN = 2, FUN = function(z) sum(is.na(z)))
ko15.CDF ko16.CDF wt15.CDF wt16.CDF
    4252     4140     4251     4218
> apply(featureValues(xdata), MARGIN = 2, FUN = function(z) sum(is.na(z)))
ko15.CDF ko16.CDF wt15.CDF wt16.CDF
    2694     2702     2806     2680

My concern is that I should not be the first to chain xcms3 and camera. With xcms 1.*, I had even more missing values which were fill with 0 and CAMERA didn't complain.

ping: @jotsetung ... maybe you have an idea.

jorainer commented 6 years ago

I'll have a look. Thanks for reporting @lecorguille !

jorainer commented 6 years ago

Now you can use featureValues(x, value = "into", missing = 0) to have the same behavior as in xcms < 3 (i.e. report 0 for missing peaks instead of NA). You can also specify missing = "rowmin_half" in which case NA values are replaced with half of the row's minimum value.

lecorguille commented 6 years ago

I missed something ... how can I chain XCMS3 with CAMERA. How can I change the NA within the XCMSnExp object to get some 0in my xcmsSet object? By the way, thanks for your help

jorainer commented 6 years ago

Ah, there is currently no way to chain xcms3 with CAMERA - I'll work on that once I find the time. I'll have a look into CAMERA...

which call exactly did result in the above error?

stanstrup commented 6 years ago

You can chain them using xset_new <- as(xset_old, "xcmsSet"). But as @lecorguille said I guess you need to avoid these NAs in whatever call CAMERA uses to take out the values. Maybe you can simply replace all NA with 0 in the peaks table?

jorainer commented 6 years ago

Thanks @stanstrup. Meanwhile I found that the error is thrown by xcms own diffreport function (i.e. after converting an XCMSnExp to a xcmsSet). One simple solution is indeed to simply replace NA with 0 in the groupval call - but I somehow don't like this idea of replacing missing values with 0. diffreport will in fact include these 0 values in the differential abundance analysis. Mean and variance estimation will simply be wrong for those rows that have 0s in them - would be better to keep them as NA even if this reduces then the degrees of freedom - but the mean and variance are estimated on the actual values... now, what to do...

jorainer commented 6 years ago

I've added now an additional parameter missing to the diffreport function. Setting missing = 0 in the diffreport call on the xcmsSet object (that was converted from an XCMSnExp) is now equivalent with a previous fillPeaks(xset) call.

@lecorguille could you test the most recent xcms from github and call diffreport with missing = 0 to see if this solves your problem? The last fix should solve the problem anyway, as I've converted the error into a warning.

lecorguille commented 6 years ago

@jotsetung You will hate me! But as you may know, my work are around the W4M project and Galaxy. Thus, I can't update easily the R package, I'm stuck to 3.4.1 because of Bioconda and Conda-forge. So to test and integrate your useful patch, it's a shame but I copy/paste in my lib file 😞

Test

It's better because indeed I didn't get NA anymore but there are still some Inf and mt.teststat doesn't like that

Error in mt.teststat(testval, testclab, ...) :
  NA/NaN/Inf in foreign function call (arg 1)
Calls: annotatediff -> diffreport -> mt.teststat -> .C

Dummy test

If for "fun", to see if I pass eventually diffreport, I had an ugly

values[, c(c1, c2)][is.infinite(values[, c(c1, c2)])] <- missing[1]

I get this error message

Error in pval(testval, testclab, tstat) : could not find function "pval"
Calls: annotatediff -> diffreport

Which is strange because I can get the ?pval Is it related to my strange solution to apply your patch (see first part of this message)?

lecorguille commented 6 years ago

@jotsetung I took time to test with a fresh version of R 3.5.1 on my Mac and an installation from GitHub of xmcs to eliminate my strange solution to patch xcms (copy/paste of the functions in my script)

Test without any strange solution to patch xcms

I get the same

Error in mt.teststat(testval, testclab, ...) :
  NA/NaN/Inf in foreign function call (arg 1)
Calls: annotatediff -> diffreport -> mt.teststat -> .C

Dummy test 2

To add the ugly

values[, c(c1, c2)][is.infinite(values[, c(c1, c2)])] <- missing[1]

I took [again] the whole diffreport function within my script

I still get this error message

Error in pval(testval, testclab, tstat) : could not find function "pval"
Calls: annotatediff -> diffreport
jorainer commented 6 years ago

Oh, sorry about that. I'll check that there are no Inf. About the pval, seems xcms has its own implementation to calculate p-values (????). you would have to call xcms:::pval if you source your own script.

I'll try to fix the above (eventually also in 3.4.1), but I'm now on holidays for 2 weeks ... so it won't be immediate.

lecorguille commented 6 years ago

I’m also on holiday from today (but me it’s for 3 weeks 😝)

Many thanks

Le ven. 20 juil. 2018 à 19:03, Johannes Rainer notifications@github.com a écrit :

Oh, sorry about that. I'll check that there are no Inf. About the pval, seems xcms has its own implementation to calculate p-values (????). you would have to call xcms:::pval if you source your own script.

I'll try to fix the above (eventually also in 3.4.1), but I'm now on holidays for 2 weeks ... so it won't be immediate.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/sneumann/CAMERA/issues/33#issuecomment-406663723, or mute the thread https://github.com/notifications/unsubscribe-auth/AKjyFoFmD2biDhaModX2ooPJ44BAKbCZks5uIg16gaJpZM4VK1Bi .

jorainer commented 6 years ago

My 2 weeks are over 😢 . Now, I've also included the fix for infinite values in the diffreport method, so with the github devel version it should work now. For the other problem (adding the diffreport method to your script): you should then use xcms:::pval instead of a simple pval call, because pval is an xcms-internal function that is not exported.

lecorguille commented 5 years ago

Arf! It's the turn of getEIC()

Start grouping after retention time.
Created 350 pseudospectra.
Generating peak matrix!
Run isotope peak annotation
 % finished: 10  20  30  40  50  60  70  80  90  100
Found isotopes: 151
Error in .local(object, ...) :
  Please use fillPeaks() to fill up NA values !
Calls: annotatediff -> diffreport -> getEIC -> getEIC -> .local
Execution halted

TEST 1

I tried to tweak a little getEIC() as you did with diffreport()

if (missing(mzrange)) {
        if (missing(groupidx))
        stop("No m/z range or groups specified")
        groupvalmzmin = groupval(object, value = "mzmin")
        groupvalmzmax = groupval(object, value = "mzmax")
        ## Optionally replace NA values with the value provided with missing
        if (length(missing)) {
            if (is.numeric(missing)) {
                ## handles NA, Inf and -Inf
                groupvalmzmin[!is.finite(groupvalmzmin)] <- missing[1]
                groupvalmzmax[!is.finite(groupvalmzmax)] <- missing[1]
            } else
            stop("'missing' should be numeric")
        }
        ## Check against missing Values
        if (any(is.na(groupvalmzmin)))
            warning("`NA` values in xcmsSet. Use fillPeaks() on the object to fill",
                    "-in missing peak values. Note however that this will also ",
                    "insert intensities of 0 for peaks that can not be filled in.")
        mzmin <- -rowMax(-groupvalmzmin)
        mzmax <- rowMax(groupvalmzmax)
        mzrange <- matrix(c(mzmin[grpidx], mzmax[grpidx]), ncol = 2)
    } else if (all(c("mzmin","mzmax") %in% colnames(mzrange)))
    mzrange <- mzrange[,c("mzmin", "mzmax"),drop=FALSE]
    else if (is.null(dim(mzrange)))
    stop("mzrange must be a matrix")
    colnames(mzrange) <- c("mzmin", "mzmax")

But ...

Start grouping after retention time.
Created 350 pseudospectra.
Generating peak matrix!
Run isotope peak annotation
 % finished: 10  20  30  40  50  60  70  80  90  100
Found isotopes: 151
Create profile matrix with method 'bin' and step 0.1 ... OK
Applying retention time correction to ko15.CDF.
Error in profEIC(object, mzrange = mzrange, rtrange = rtrange, step = step) :
  'mzrange' number 1 (0, 487.300018310547) is outside of the mz value range of 'object'
Calls: annotatediff ... getEIC -> <Anonymous> -> <Anonymous> -> .local -> profEIC
Execution halted

It comes from the getEIC() within getEIC():

currenteic <- xcms:::getEIC(lcraw, mzrange, rtrange, step = prof$step)

If it can help:

> print(range(mzrange[1, ]))
[1]   0.0 487.3
> print(lcraw@mzrange)
[1] 200 600

Am I the first one who want to chain xcms 3.0 with CAMERA or is it because of my toy samples?

jorainer commented 5 years ago

Ah that's a bummer. And the fun thing is that there is a CAMERA:::getEIC and an xcms::getEIC - I'm really wondering which one is called by your function.

Could you provide the code of your toy example, i.e. from where you read the file, do peak detection and then also the CAMERA code? I could then directly have a look what happens and could seek an ideal solution.

lecorguille commented 5 years ago

OH! I used to run the whole script and now that I have extracting the main lines ... Indeed, it isn't an issue with CAMERA but with diffreport (I though that CAMERA launched diffreport). Sorry for the confusion and the wrong place.

So, you can find a R script with only the lines needed : https://gist.github.com/lecorguille/a682178c983126f42e79007450e20e19

Many thanks @jotsetung

jorainer commented 5 years ago

Thanks for sharing the code! I fixed the relevant code in xcms::getEIC. In fact the function was throwing an error if there was a single NA in the full data set, even if no EIC was about to get extracted for the relevant intensity. Until you can update to the new xcms version, you should also use the getEIC,xcmsSet code (in the methods-xcmsSet.R file) from xcms in your script (see changes introduced by https://github.com/sneumann/xcms/commit/a816490bcbdfda5edb9d9b1abf5ee71590c9425c).

One thing which is important in your code: you have h="480" and w="640", i.e. you are providing the height and width as character and the diffreport had no check for this (took me quite a while to understand why the code failed).

raagbtitl commented 5 years ago

When I am going into CAMERA from xcms, I am getting this issue: Error in rcorr(peaktable[,pi], type="pearson") : NA/NaN/Inf in foreign function call (arg 1)

I found above line in https://rdrr.io/github/sneumann/CAMERA/src/R/fct_groupCorr.R

How can I fix this?

jorainer commented 5 years ago

Dear @raagbtitl could you provide some more information, such as the commands you used to get there as well as the output of your sessionInfo?

raagbtitl commented 5 years ago

Dear Johannes, I used following below CAMERA wrapper command from MAIT (https://www.rdocumentation.org/packages/MAIT/versions/1.6.0/topics/peakAnnotation) for peak annotation : MAIT::peakAnnotation(MAIT.object = MAIT.object, perfwhm = 0.6,annotateAdducts = TRUE, printSpectraTable=FALSE)

Also sessionInfo information below:

R version 3.5.0 (2018-04-23) Platform: x86_64-redhat-linux-gnu (64-bit) Running under: CentOS Linux 7 (Core)

Matrix products: default BLAS/LAPACK: /usr/lib64/R/lib/libRblas.so

locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages: [1] tools parallel stats graphics grDevices utils datasets [8] methods base

other attached packages: [1] dplyr_0.7.6 optparse_1.6.0 magrittr_1.5 [4] RColorBrewer_1.1-2 MAIT_1.15.0 pls_2.6-0 [7] CAMERA_1.37.0 xcms_3.3.2 MSnbase_2.7.3 [10] ProtGenerics_1.13.0 mzR_2.15.1 Rcpp_0.12.18 [13] Biobase_2.41.2 BiocGenerics_0.27.1 BiocParallel_1.15.8

loaded via a namespace (and not attached): [1] backports_1.1.2 spam_2.2-0 Hmisc_4.1-1 [4] plyr_1.8.4 igraph_1.2.2 lazyeval_0.2.1 [7] sp_1.3-1 splines_3.5.0 AlgDesign_1.1-7.3 [10] ggplot2_3.0.0 digest_0.6.15 foreach_1.4.4 [13] BiocInstaller_1.31.3 htmltools_0.3.6 gdata_2.18.0 [16] checkmate_1.8.5 cluster_2.0.7-1 doParallel_1.0.11 [19] sfsmisc_1.1-2 limma_3.37.3 recipes_0.1.3 [22] gower_0.1.2 gmodels_2.18.1 dimRed_0.1.0 [25] colorspace_1.3-2 crayon_1.3.4 graph_1.59.0 [28] bindr_0.1.1 impute_1.55.0 survival_2.42-6 [31] iterators_1.0.10 glue_1.3.0 DRR_0.0.3 [34] gtable_0.2.0 ipred_0.9-6 zlibbioc_1.27.0 [37] questionr_0.6.3 kernlab_0.9-27 ddalpha_1.3.4 [40] DEoptimR_1.0-8 maps_3.3.0 abind_1.4-5 [43] scales_1.0.0 vsn_3.49.1 miniUI_0.1.1.1 [46] xtable_1.8-2 spData_0.2.9.3 htmlTable_1.12 [49] magic_1.5-8 foreign_0.8-71 spdep_0.7-7 [52] preprocessCore_1.43.0 dotCall64_1.0-0 Formula_1.2-3 [55] stats4_3.5.0 lava_1.6.3 prodlim_2018.04.18 [58] getopt_1.20.2 htmlwidgets_1.2 gplots_3.0.1 [61] acepack_1.4.1 pkgconfig_2.0.1 XML_3.98-1.16 [64] deldir_0.1-15 nnet_7.3-12 caret_6.0-80 [67] tidyselect_0.2.4 rlang_0.2.1 reshape2_1.4.3 [70] later_0.7.3 munsell_0.5.0 broom_0.5.0 [73] geometry_0.3-6 stringr_1.3.1 mzID_1.19.0 [76] ModelMetrics_1.2.0 RhpcBLASctl_0.18-205 knitr_1.20 [79] robustbase_0.93-2 caTools_1.17.1.1 purrr_0.2.5 [82] RANN_2.6 bindrcpp_0.2.2 RBGL_1.57.0 [85] nlme_3.1-137 mime_0.5 RcppRoll_0.3.0 [88] compiler_3.5.0 rstudioapi_0.7 e1071_1.7-0 [91] affyio_1.51.0 klaR_0.6-14 MassSpecWavelet_1.47.0 [94] tibble_1.4.2 stringi_1.2.4 highr_0.7 [97] fields_9.6 lattice_0.20-35 Matrix_1.2-14 [100] multtest_2.37.0 LearnBayes_2.15.1 pillar_1.3.0 [103] combinat_0.0-8 MALDIquant_1.18 data.table_1.11.4 [106] bitops_1.0-6 httpuv_1.4.5 agricolae_1.2-8 [109] R6_2.2.2 latticeExtra_0.6-28 pcaMethods_1.73.0 [112] affy_1.59.0 promises_1.0.1 KernSmooth_2.23-15 [115] gridExtra_2.3 IRanges_2.15.16 codetools_0.2-15 [118] boot_1.3-20 MASS_7.3-50 gtools_3.8.1 [121] assertthat_0.2.0 CVST_0.2-2 withr_2.1.2 [124] S4Vectors_0.19.19 expm_0.999-2 plsgenomics_1.5-1 [127] grid_3.5.0 rpart_4.1-13 timeDate_3043.102 [130] coda_0.19-1 tidyr_0.8.1 class_7.3-14 [133] shiny_1.1.0 lubridate_1.7.4 base64enc_0.1-3