rpeckner-broad / Specter

Specter: linear deconvolution for targeted analysis of data-independent acquisition mass spectrometry proteomics
17 stars 12 forks source link

Linear Discriminant Analysis Error #8

Closed DarioS closed 6 years ago

DarioS commented 6 years ago

I get an error when singular value decomposition is executed.

$ ./SpecterStandalone.sh ~/SpecterTest/CC20170118_SAM_Specter_Ecolidigest_DDA_01 ~/SpecterTest/EcoliSpectralLibrary 2000 end 12
Library loaded in 2.0 minutes
Loaded 1024 MS2 spectra from /home/dario/SpecterTest/CC20170118_SAM_Specter_Ecolidigest_DDA_01.mzML in 1.2 minutes.
Header written to /home/dario/SpecterTest/SpecterResults/CC20170118_SAM_Specter_Ecolidigest_DDA_01_EcoliSpectralLibrary_header.csv.
Analyzing MS2 spectra:
100% 1023/1023 [04:46<00:00,  3.57it/s]
Output written to /home/dario/SpecterTest/SpecterResults/CC20170118_SAM_Specter_Ecolidigest_DDA_01_EcoliSpectralLibrary_SpecterCoeffs.csv.
Analyzing MS2 spectra with decoys:
100% 1023/1023 [04:43<00:00,  3.61it/s]
Output written to /home/dario/SpecterTest/SpecterResults/CC20170118_SAM_Specter_Ecolidigest_DDA_01_EcoliSpectralLibrary_SpecterCoeffsDecoys.csv.
Loading required package: kza
Loading required package: pracma
Loading required package: zoo

Attaching package: ‘zoo’

The following objects are masked from ‘package:base’:

    as.Date, as.Date.numeric

Loading required package: moments
There were 50 or more warnings (use warnings() to see the first 50)
There were 50 or more warnings (use warnings() to see the first 50)
Error in svd(X, nu = 0L) : infinite or missing values in 'x'
Calls: lda -> lda.formula -> lda.default -> svd
Execution halted

How can I successfully complete the LDA?

rpeckner-broad commented 6 years ago

Unfortunately this is an unavoidable error in situations when only a small number (<10,000) of MS2 have been selected for analysis, as seems to be the case here. The problem is that very few decoy spectra (as well as true library spectra) will be identified when small numbers of MS2 spectra are considered, especially since they're coming from the hydrophilic early stage of the run. The LDA then fails because there is essentially no distribution of decoy scores to separate from the target scores, rendering an FDR meaningless. You can verify this by checking the contents of the _SpecterCoeffsDecoy.csv file - most likely there will be very few decoy precursors (as indicated by the DECOY_ prefix in their sequences) with any nonzero coefficients at all (which in any case isn't enough on its own to qualify for identification).

I would recommend increasing the 2000 parameter to at least 10000 and preferably higher, depending on how long you're willing to wait. Also, I'm not sure how meaningful the results will be when you're analyzing a DDA rather than DIA experiment...conceptually I see no problem at the level of the linear deconvolution, but the identifications and FDR estimation may be much more problematic because it's all based on the multiple sampling across the peak of each precursor, which you don't get in DDA (we have dynamic exclusion on in most of our DDA experiments).