sneumann / CAMERA

This is the git repository matching the Bioconductor package CAMERA: Collection of annotation related methods for mass spectrometry data
11 stars 22 forks source link

findIsotope: subscript out of bound #27

Open rromoli opened 6 years ago

rromoli commented 6 years ago

Running findIsotopes() I get the following error:

an <- findIsotopes(an) Generating peak matrix! Run isotope peak annotation % finished: Error in mint[ipeak, , drop = FALSE] : subscript out of bounds

Let me to know how may I help you.

Kind regards Riccardo

rromoli commented 6 years ago

Browsing the code I found the line 684 in xsAnnotate.R that give the error:

    # Ouput counter
    percentOutput(npeaks.global, length(ipeak), ncl, lp)
    browser()
    # Pseudospectrum has more than one peak
    if(length(ipeak) > 1){
      # peak mass and intensity for pseudospectrum
      mz  <- imz[ipeak];
      int <- mint[ipeak, , drop=FALSE]; # error subscript out of bound
      isomatrix <-  findIsotopesPspec(isomatrix, mz, ipeak, int, params)              
    }
  }

I tryed to eliminate the error so I changed the line into: int <- mint[ipeak, drop=FALSE]; but I get a further error in the next step (findIsotopesPspec(), fct_findIsotopes.R, line 38):

Error in int[order(spectra[, 1]), , drop = FALSE] : 
  incorrect number of dimensions

To try to solve the issue can you say to me what type of object is int and why the use of the double comma in mint[ipeak, ,drop=FALSE]?

All the best Ric

sneumann commented 6 years ago

Hi, double comma is an unusual R construct. int[x,y] gives you the subset of int indicated by x,y. If x indicates a single row, R will happily reduce the resulting single-row matrix into a vector, which is a regular pain point because the result has one or two dimensions depending on x. With another argument ,drop=FALSE this is avoided. This and other "fun" aspects are also in http://www.burns-stat.com/pages/Tutor/R_inferno.pdf which is nice read for people who want to get deeper into R and its idiosyncrasies. So mint could be the matrix of intensities, and int would be the subset matrix of the isotope peaks.

So what is the value of ipeak when you hit the error ? Yours, Steffen

rromoli commented 6 years ago

This is what I get running the debug:

debug at /home/erre/binR/CAMERA_1.35.0/CAMERA/R/xsAnnotate.R#644: if (length(ipeak) > 1) {
    mz <- imz[ipeak]
    int <- mint[ipeak, drop = FALSE]
    isomatrix <- findIsotopesPspec(isomatrix, mz, ipeak, int, 
        params)
}
Browse[2]> 
debug at /home/erre/binR/CAMERA_1.35.0/CAMERA/R/xsAnnotate.R#646: mz <- imz[ipeak]
Browse[2]> 
debug at /home/erre/binR/CAMERA_1.35.0/CAMERA/R/xsAnnotate.R#649: int <- mint[ipeak, drop = FALSE]
Browse[2]> 
debug at /home/erre/binR/CAMERA_1.35.0/CAMERA/R/xsAnnotate.R#650: isomatrix <- findIsotopesPspec(isomatrix, mz, ipeak, int, params)
Browse[2]> str(ipeak)
 num [1:58] 3998 4008 4010 4011 4013 ...
Browse[2]> ipeak
 [1] 3998 4008 4010 4011 4013 4014 4015 4016 4017 4018 4019 4020 4021 4022 4025
[16] 4026 4027 4028 4030 4032 4034 4036 4037 4038 4039 4041 4043 4046 4047 4049
[31] 4058 4069 4075 4081 4084 4088 4095 4098 4099 4100 4117 4134 4160 4238 4301
[46] 4350 4429 4521 4580 4723 4793 5442 5454 5851 5953 6357 6908 6940
Browse[2]> str(mint)
 num [1:6744, 1] 69426 21000 42292 265224 29229 ...
 - attr(*, "dimnames")=List of 2
  ..$ : chr [1:6744] "120.1/1314" "120/272" "120.1/439" "120/729" ...
  ..$ : chr "teriacaFS_01.mzXML"
Browse[2]> 

Seems that the number of the ipeak index and the row number of the mint matrix do not coincide. The max ipeak value is 6940 while the mint matrix has only 6744 raw.

P.S.

This and other "fun" aspects are also in http://www.burns-stat.com/pages/Tutor/R_inferno.pdf which is nice read for people who want to get deeper into R and its idiosyncrasies.

Thanks for the suggestion, very interesting to read this!

sneumann commented 5 years ago

The issue is that CAMERA fails for grouped single-sample xcmsSet objects:

library(CAMERA)
library(faahKO)

xset1 <- xcmsSet(filepaths(faahko)[1])
xcam1 <- xsAnnotate(xset1, polarity="positive")
xcam1 <- groupFWHM(xcam1, perfwhm=0.6)
xcam1 <- findIsotopes(xcam1, ppm=5, mzabs=0.005)

xset2 <- group(xset1)
xcam2 <- xsAnnotate(xset2, polarity="positive")
xcam2 <- groupFWHM(xcam2, perfwhm=0.6)
xcam2 <- findIsotopes(xcam2, ppm=5, mzabs=0.005)

So as a workaround, try to not group() the xcmsSet if it has only a single sample, or use the following hack to remove an existing grouping: xset2 <- split(xset1, "a")[[1]]

My gut feeling is that somewhere CAMERA is mixing up the values for nrow(peaks(xset)) and nrow(groups(xset)). Fixing that would be the proper solution.

Yours, Steffen

stanstrup commented 4 years ago

I have run into this also. I haven't been able to understand what the proper solution would be but the problem is here: https://github.com/sneumann/CAMERA/blob/3e25fab0cf3ac815e7ec634541f4052df0fbd079/R/xsAnnotate.R#L608

The if statement splits in to "multiple sample or grouped single sample" (uses groupval that returns a row per feature) and "one sample case" (uses @groupInfo that returns a row per peak). But it doesn't really work for the grouped single sample since it runs through all peaks but only has a table with the features that are fewer. I actually wonder why this works when there are multiple samples...

The error is generated in the loop here https://github.com/sneumann/CAMERA/blob/3e25fab0cf3ac815e7ec634541f4052df0fbd079/R/xsAnnotate.R#L636 because it runs through all feature groups. These feature groups contain all the peaks. But it then indexes in the mint object that only has a row per feature, not per peak.

I am very confused. @pspectra is referencing peaks. But shouldn't it reference features?