incorrect number of dimensions in PuInc_seeker()

sneumann commented 8 years ago

Hi, I am trying to run the example script with the MTBLS213 data, and get the following error in PuInc_seeker(). Very often this is due to a missing DROP=FALSE in some access, so that a matrix/dataframe gets "helpfully" changed from 2D to just a vector by R.

Yours, Steffen

> xset3
An "xcmsSet" object with 12 samples

Time range: 2.5-1259.8 seconds (0-21 minutes)
Mass range: 100.0112-1499.59 m/z
Peaks: 125327 (about 10444 per sample)
Peak Groups: 7588 
Sample classes: MTBLS213 

Peak picking was performed on MS1.
Profile settings: method = bin
                  step = 0.1

Memory usage: 12.6 MB
> s1 <- PuInc_seeker(XCMSet=xset3,ULtag="CELL_Glc12",Ltag="CELL_Glc13",sep.pos="f")
Error in meanintensities[grep(ULtag, rownames(meanintensities)), ] : 
  incorrect number of dimensions

sneumann commented 8 years ago

Hi, I found the issue, when downloading MTBLS213, and running xcmsSet() on the files, the sampleclasses are missing, but I am unsure which classes are expected from the geoRge script. Is it just sampclass(xset) <- c(rep("CELL_Glc12", 6), rep("CELL_Glc13", 6)) is not enough, since I still get an error. Can you provide the code to set the right classes ? Thanks, Steffen

jcapelladesto commented 8 years ago

Hi Steffen, thanks for reporting your problem, I think I know where this issue comes from. Could you paste here the result from running xset3@phenoData$class ?

sneumann commented 8 years ago

Hi Jordi, thanks for the quick response. Here is the phenoData:

> xset3@phenoData$class 
 [1] CELL_Glc12 CELL_Glc12 CELL_Glc12 CELL_Glc12 CELL_Glc12 CELL_Glc12 CELL_Glc13 CELL_Glc13 CELL_Glc13
[10] CELL_Glc13 CELL_Glc13 CELL_Glc13
Levels: CELL_Glc12 CELL_Glc13

The whole xset3 is:

> xset3
An "xcmsSet" object with 12 samples

Time range: 2.5-1259.8 seconds (0-21 minutes)
Mass range: 100.0112-1499.59 m/z
Peaks: 125327 (about 10444 per sample)
Peak Groups: 10381 
Sample classes: CELL_Glc12, CELL_Glc13 

Peak picking was performed on MS1.
Profile settings: method = bin
                  step = 0.1

Memory usage: 13 MB
> phenoData(xset3)
                              class
CELL_Glc12_05mM_Normo_04 CELL_Glc12
CELL_Glc12_05mM_Normo_05 CELL_Glc12
CELL_Glc12_05mM_Normo_06 CELL_Glc12
CELL_Glc12_25mM_Normo_16 CELL_Glc12
CELL_Glc12_25mM_Normo_17 CELL_Glc12
CELL_Glc12_25mM_Normo_18 CELL_Glc12
CELL_Glc13_05mM_Normo_01 CELL_Glc13
CELL_Glc13_05mM_Normo_02 CELL_Glc13
CELL_Glc13_05mM_Normo_03 CELL_Glc13
CELL_Glc13_25mM_Normo_13 CELL_Glc13
CELL_Glc13_25mM_Normo_14 CELL_Glc13
CELL_Glc13_25mM_Normo_15 CELL_Glc13

jcapelladesto commented 8 years ago

Dear Steffen,

I believe you did not use folders to organise the .mzXML files when preprocessing with XCMS, we did not expect that, honestly because we (our lab) always use folders. Since geoRge depends on xset3@phenoData$class to define the different experimental conditions, this can be solved by creating them manuallyxset3@phenoData$class <- your_values_vector. As an example, phenoData should look like this:

> phenoData(xset3)
                                         class
CELL_Glc12_05mM_Normo_04 CELL_Glc12_05mM_Normo
CELL_Glc12_05mM_Normo_05 CELL_Glc12_05mM_Normo
CELL_Glc12_05mM_Normo_06 CELL_Glc12_05mM_Normo
CELL_Glc12_25mM_Normo_16 CELL_Glc12_25mM_Normo
CELL_Glc12_25mM_Normo_17 CELL_Glc12_25mM_Normo
CELL_Glc12_25mM_Normo_18 CELL_Glc12_25mM_Normo
CELL_Glc13_05mM_Normo_01 CELL_Glc13_05mM_Normo
CELL_Glc13_05mM_Normo_02 CELL_Glc13_05mM_Normo
CELL_Glc13_05mM_Normo_03 CELL_Glc13_05mM_Normo
CELL_Glc13_25mM_Normo_13 CELL_Glc13_25mM_Normo
CELL_Glc13_25mM_Normo_14 CELL_Glc13_25mM_Normo
CELL_Glc13_25mM_Normo_15 CELL_Glc13_25mM_Normo

This can be done by removing the value at the end of the sample names. Or you could even create a new phenoData(xset3)$class value on your own. As long as it has the following structure (Example as given above):

ULtag/Ltag (_"CELL_Glc12" / "CELLGlc13")
a separator ("_")
condition group names (_"05mM_Normo", "25mMNormo")

geoRge will evaluate the structure of the string by checking the position of the separator with sep.pos value, and then save the values for the PuInc analysis.

sneumann commented 8 years ago

Ok, I am one step further:

> phenoData(xset3)
                                         class
CELL_Glc12_05mM_Normo_04 CELL_Glc12_05mM_Normo
CELL_Glc12_05mM_Normo_05 CELL_Glc12_05mM_Normo
...
CELL_Glc13_25mM_Normo_15 CELL_Glc13_25mM_Normo

> s1 <- PuInc_seeker(XCMSet=xset3,ULtag="CELL_Glc12",Ltag="CELL_Glc13",sep.pos="f")
 Error in `[.data.frame`(D1, , filtsampsint) : undefined columns selected

I think that is because filtsampsint contains NA values. I changed the subselection now to filtsampsint <- na.omit(filtsamps[apply(meanintensities, 2, function(x) all(x < PuInc.int.lim))]) and successfully got the s1 object. You'll get a pull request with that change soon.

The next issue I get is

> s2 <- basepeak_finder(PuIncR=s1,XCMSet=xset3,ULtag="CELL_Glc12",Ltag="CELL_Glc13",
+   sep.pos="f",UL.atomM=12.0,L.atomM=13.003355,
+   ppm.s=6.5,Basepeak.minInt=2000)
 Error in if ((mi12 > Basepeak.minInt)) { : 
  missing value where TRUE/FALSE needed

The s1 object is

> str(s1)
List of 4
 $ PuInc           : num [1:40, 1:4] 112 112 129 130 141 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:40] "1" "2" "3" "4" ...
  .. ..$ : chr [1:4] "mzmed" "rtmed" "rtmin" "rtmax"
 $ PuInc_conditions: Named chr [1:40] "25mM_Normo" "25mM_Normo" "05mM_Normo" "05mM_Normo" ...
  ..- attr(*, "names")= chr [1:40] "105" "106" "429" "447" ...
 $ pvalue          :'data.frame':   1297 obs. of  2 variables:
  ..$ 05mM_Normo: num [1:1297] 0.82 0.274 0.591 0.22 0.596 ...
  ..$ 25mM_Normo: num [1:1297] 0.936 0.662 0.453 0.305 0.847 ...
 $ foldchange      :'data.frame':   1297 obs. of  2 variables:
  ..$ 05mM_Normo: num [1:1297] 1.07 1.16 1.29 1.22 -1.03 ...
  ..$ 25mM_Normo: num [1:1297] -1.01 -1.06 1.04 -1.28 1.03 ...

With the following traceback:

8 FUN(X[[i]], ...) 
7 lapply(X = X, FUN = FUN, ...) 
6 sapply(cond, USE.NAMES = F, simplify = T, function(x) {
    mi12 <- mi[grep(ULtag, names(mi))]
    mi12 <- mi12[grep(x, names(mi12))]
    if ((mi12 > Basepeak.minInt)) { ... 
5 sapply(cond, USE.NAMES = F, simplify = T, function(x) {
    mi12 <- mi[grep(ULtag, names(mi))]
    mi12 <- mi12[grep(x, names(mi12))]
    if ((mi12 > Basepeak.minInt)) { ... at george.R#194
4 FUN(X[[i]], ...) 
3 lapply(rownames(res_inc), function(y) {
    isot <- sapply(1:max_atoms(res_inc[y, "mzmed"], L.atomM), 
        function(x) {
            res_inc[y, "mzmed"] - (x * mass_diff) ... 
2 lapply(rownames(res_inc), function(y) {
    isot <- sapply(1:max_atoms(res_inc[y, "mzmed"], L.atomM), 
        function(x) {
            res_inc[y, "mzmed"] - (x * mass_diff) ... at george.R#149
1 basepeak_finder(PuIncR = s1, XCMSet = xset3, ULtag = "CELL_Glc12", 
    Ltag = "CELL_Glc13", sep.pos = "f", UL.atomM = 12, L.atomM = 13.003355, 
    ppm.s = 6.5, Basepeak.minInt = 2000)

Any ideas ? Yours, Steffen

sneumann commented 8 years ago

Debugging a bit further I find that mi12 is NULL, and mi is (NA, NA, 1267, 936). I am running R-3.2.3, do you have a version with different NA handling behaviour ?

jcapelladesto commented 8 years ago

Dear Steffen,

Thank you for that. I think that might be it, when calculating the miobject (mean intensity) the mean is not ready to deal with NA values, might as well use the na.rm=T for that.

I also use version 3 from R (R-3.1.0). So we should not find version compatibility issues (hopefully). Edit: I assume you are not using fillPeaks() when running XCMS, that is the most probable source of NA values.

sneumann commented 8 years ago

Still debugging here. Just to be sure, you don't fillPeaks() the xsets ?

jcapelladesto commented 8 years ago

On the contrary, we do use fillPeaks() when running XCMS.

Edit:

I just checked "Example.R" and saw that the xset3 <- fillPeaks(xset3) was missing. I just added it.

Sorry for the inconvenience.

sneumann commented 8 years ago

Ok, that was it. fillPeaks() was missing from Example.R Thanks for your patience, yours, Steffen

jcapelladesto / geoRge

incorrect number of dimensions in PuInc_seeker() #1