lgatto / synapter

Label-free data analysis pipeline for optimal identification and quantitation
https://lgatto.github.io/synapter/
4 stars 2 forks source link

Code chunk plotcxseq in fragmentmatching vigette fails #92

Closed lgatto closed 9 years ago

lgatto commented 9 years ago

When running the vignette, I get the following error

plotFragmentMatching(synobj2,
                  key = "TALIDGLAQR", column = "peptide.seq",
                  verbose = FALSE)
Error in if (nchar(sequences[[i]])) { : 
  argument is not interpretable as logical

The offending line fails because

Browse[2]> sequences
[1] "TALIDGLAQR" NA 
sgibb commented 9 years ago

Sorry, but I can't reproduce it:

sequences <- c("TALIDGLAQR", NA)
nchar(sequences)
# [1] 10  2
sessionInfo()
R version 3.2.1 (2015-06-18)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 8 (jessie)

locale:
 [1] LC_CTYPE=de_DE.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=de_DE.UTF-8        LC_COLLATE=de_DE.UTF-8    
 [5] LC_MONETARY=de_DE.UTF-8    LC_MESSAGES=de_DE.UTF-8   
 [7] LC_PAPER=de_DE.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base 
lgatto commented 9 years ago

Here is the exact code, from the vignette:

> library(xtable)
> suppressPackageStartupMessages(library(synapter))
> library(synapterdata)
> load("~/Data2/synapter/synobj2.rda")
> library("synapterdata")
> synapter::filterFragments(synobj2, what = "fragments.ident", minIntensity = 70, verbose = FALSE)
synapter::filterFragments(synobj2, what = "spectra.quant", minIntensity = 70, verbose = FALSE)
> setFragmentMatchingPpmTolerance(synobj2, 25)
> fragmentMatching(synobj2, verbose = FALSE)
> plotFragmentMatching(synobj2, key = 28, column = "FragmentMatching", verbose = FALSE)
> plotFragmentMatching(synobj2, key = "TALIDGLAQR", column = "peptide.seq", verbose = FALSE)
Error in if (nchar(sequences[[i]])) { : 
  argument is not interpretable as logical
> traceback()
7: .plotSpectrumVsSpectrum(spectra, sequences = unlist(sequences), 
       common = list(MSnbase:::commonPeaks(spectra[[1]], spectra[[2]], 
           tolerance = tolerance), MSnbase:::commonPeaks(spectra[[2]], 
           spectra[[1]], tolerance = tolerance)), fragments = list(data.frame(mz = mz(spectra[[1]]), 
           ion = fragments, stringsAsFactors = FALSE), data.frame()), 
       main = "Identification Fragments vs Quantitation Spectrum", 
       xlim = xlim, ylim = ylim, ...)
6: .plotFragmentsVsSpectrum(spectra = spectra, sequences = sequences, 
       fragments = fragments, gridSearchResult = fmrow$gridSearchResult, 
       legend.cex = legend.cex, fragments.cex = fragments.cex, ...)
5: .plotCxRow(fmrow = fm[i, ], identFragments = obj$IdentFragmentData, 
       quantSpectra = obj$QuantSpectrumData, ...)
4: .plotFragmentMatching(object, key, column = column, verbose = verbose, 
       tolerance = object$FragmentMatchingPpmTolerance/1e+06, ...)
3: .local(object, ...)
2: plotFragmentMatching(synobj2, key = "TALIDGLAQR", column = "peptide.seq", 
       verbose = FALSE)
1: plotFragmentMatching(synobj2, key = "TALIDGLAQR", column = "peptide.seq", 
       verbose = FALSE)
> sessionInfo()
R Under development (unstable) (2015-07-01 r68620)
Platform: x86_64-unknown-linux-gnu (64-bit)
Running under: Ubuntu 14.04.2 LTS

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8    
 [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8   
 [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] synapterdata_1.7.0  synapter_1.99.0     MSnbase_1.17.12    
 [4] ProtGenerics_1.1.0  BiocParallel_1.3.34 mzR_2.3.1          
 [7] Rcpp_0.11.6         Biobase_2.29.1      BiocGenerics_0.15.3
[10] xtable_1.7-4       

loaded via a namespace (and not attached):
 [1] compiler_3.3.0        RColorBrewer_1.1-2    XVector_0.9.1        
 [4] BiocInstaller_1.19.8  futile.logger_1.4.1   plyr_1.8.3           
 [7] iterators_1.0.7       futile.options_1.0.0  tools_3.3.0          
[10] zlibbioc_1.15.0       MALDIquant_1.12       digest_0.6.8         
[13] preprocessCore_1.31.0 gtable_0.1.2          lattice_0.20-33      
[16] foreach_1.4.2         proto_0.3-10          hwriter_1.3.2        
[19] knitr_1.10.5          stringr_1.0.0         Biostrings_2.37.2    
[22] S4Vectors_0.7.10      IRanges_2.3.14        multtest_2.25.2      
[25] stats4_3.3.0          grid_3.3.0            qvalue_2.1.0         
[28] impute_1.43.0         survival_2.38-3       XML_3.98-1.3         
[31] limma_3.25.13         cleaver_1.7.0         ggplot2_1.0.1        
[34] reshape2_1.4.1        lambda.r_1.1.7        magrittr_1.5         
[37] splines_3.3.0         scales_0.2.5          pcaMethods_1.59.0    
[40] codetools_0.2-14      MASS_7.3-43           mzID_1.7.1           
[43] colorspace_1.2-6      stringi_0.5-5         affy_1.47.1          
[46] munsell_0.4.2         doParallel_1.0.8      vsn_3.37.3           
[49] affyio_1.37.0        

But, here is where things become interesting...

In the same session as above:

> sequences <- c("TALIDGLAQR", NA)
> nchar(sequences)
[1] 10 NA

In another (but see R version)

> sequences <- c("TALIDGLAQR", NA)
> nchar(sequences)
[1] 10  2
> sessionInfo()
R version 3.2.1 Patched (2015-07-01 r68620)
Platform: x86_64-unknown-linux-gnu (64-bit)
Running under: Ubuntu 14.04.2 LTS

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8    
 [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8   
 [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] tools_3.2.1

It looks like NA is new in R devel (I think this is the right behaviour, by the way).

Before doing any changes in the code, why do we end up with an NA, why don't we have the sequence in sequence[[2]] (and why not sequence[2])?

lgatto commented 9 years ago

Followup:

> version
               _                                          
platform       x86_64-unknown-linux-gnu                   
arch           x86_64                                     
os             linux-gnu                                  
system         x86_64, linux-gnu                          
status         Patched                                    
major          3                                          
minor          2.1                                        
year           2015                                       
month          07                                         
day            01                                         
svn rev        68620                                      
language       R                                          
version.string R version 3.2.1 Patched (2015-07-01 r68620)
nickname       World-Famous Astronaut                     
> nchar(NA, keepNA=FALSE) ## default
[1] 2
> nchar(NA, keepNA=NA) 
[1] NA
> args(nchar)
function (x, type = "chars", allowNA = FALSE, keepNA = FALSE) 
NULL

and

> version
               _                                                 
platform       x86_64-unknown-linux-gnu                          
arch           x86_64                                            
os             linux-gnu                                         
system         x86_64, linux-gnu                                 
status         Under development (unstable)                      
major          3                                                 
minor          3.0                                               
year           2015                                              
month          07                                                
day            01                                                
svn rev        68620                                             
language       R                                                 
version.string R Under development (unstable) (2015-07-01 r68620)
nickname       Unsuffered Consequences                           
> nchar(NA, keepNA=NA) ## default
[1] NA
> nchar(NA, keepNA=FALSE)
[1] 2
> args(nchar)
function (x, type = "chars", allowNA = FALSE, keepNA = NA) 
NULL
lgatto commented 9 years ago

So we can set keepNA = FALSE to keep the old behaviour, but I'm still surprised about the fact that sequences has a NA.

sgibb commented 9 years ago

Ok, I hopefully fixed it for the current R and R-devel.

IMHO it is very normal that sequences has a lot of NAs. The fragment matching compares the fragments of the identification run (top) to the spectra of the quantitation run (bottom). In contrast to the fragments the quantitation spectra aren't always identified (so no sequence information is available, this is also true if both runs are HDMSE).

lgatto commented 9 years ago

IMHO it is very normal that sequences has a lot of NAs. The fragment matching compares the fragments of the identification run (top) to the spectra of the quantitation run (bottom). In contrast to the fragments the quantitation spectra aren't always identified (so no sequence information is available, this is also true if both runs are HDMSE)

Indeed. Thanks for the clarification.