Closed andrewjmc closed 4 years ago
I have confirmed that rerunning the dinosaur command gets the same error.
I think the problem is that the targets file (search_and_link_124_to_122.psms.pout.dinosaur_targets.tsv
) is empty:
mz charge mzDiff rtStart rtEnd minApexInt id
0
In the run up o the dinosaur run, everything seemed OK:
Matching features 124->122 (462/482)
Features:
ppmDiff rtDiff precMz rTime queryIsPlaceHolder targetIsPlaceHolder charge1 charge2 charge3 charge4plus
Percolator version 3.02.1, Build Date Jan 16 2020 07:33:00
Copyright (c) 2006-9 University of Washington. All rights reserved.
Written by Lukas Käll (lukall@u.washington.edu) in the
Department of Genome Sciences at the University of Washington.
Issued command:
percolator --only-psms --post-processing-tdc input_file_placeholder --trainFDR 0.02 --testFDR 0.02 --results-psms .\quan
denser_output_only_samples_noOX1\percolator/link_124_to_122.psms --decoy-results-psms .\quandenser_output_only_samples_n
oOX1\percolator/link_124_to_122.psms.decoys
Started Sat Jan 25 01:32:13 2020
Hyperparameters: selectionFdr=0.02, Cpos=0, Cneg=0, maxNiter=10
FeatureNames::getNumFeatures(): 10
Train/test set contains 1560337 positives and 1083859 negatives, size ratio=1.43961 and pi0=1
Selecting Cpos by cross-validation.
Selecting Cneg by cross-validation.
Found 113958 test set positives with q<0.02 in initial direction
Reading in data and feature calculation took 10.76 cpu seconds or 11 seconds wall clock time.
---Training with Cpos selected by cross validation, Cneg selected by cross validation, initial_fdr=0.02, fdr=0.02
Iteration 1: Estimated 127448 PSMs with q<0.02
Iteration 2: Estimated 137562 PSMs with q<0.02
Iteration 3: Estimated 145186 PSMs with q<0.02
Iteration 4: Estimated 151166 PSMs with q<0.02
Iteration 5: Estimated 155651 PSMs with q<0.02
Iteration 6: Estimated 158503 PSMs with q<0.02
Iteration 7: Estimated 160781 PSMs with q<0.02
Iteration 8: Estimated 162627 PSMs with q<0.02
Iteration 9: Estimated 164150 PSMs with q<0.02
Iteration 10: Estimated 165361 PSMs with q<0.02
Learned normalized SVM weights for the 3 cross-validation splits:
Split1 Split2 Split3 FeatureName
-25.6049 -25.3290 -24.9957 ppmDiff
-2.1010 -2.0142 -2.0114 rtDiff
-0.0281 -0.0343 -0.0294 precMz
-0.0914 -0.0870 -0.0877 rTime
-0.7556 -0.7668 -0.7499 queryIsPlaceHolder
0.0105 0.0081 0.0081 targetIsPlaceHolder
0.1344 0.1286 0.1287 charge1
-0.0431 -0.0448 -0.0434 charge2
-0.0487 -0.0462 -0.0485 charge3
-0.0162 -0.0102 -0.0095 charge4plus
-32.1126 -31.6627 -31.2687 m0
Found 165368 test set PSMs with q<0.02.
Selected best-scoring PSM per scan+expMass (target-decoy competition): 1115252 target PSMs and 706656 decoy PSMs.
Calculating q values.
Final list yields 149751 target PSMs with q<0.02.
Calculating posterior error probabilities (PEPs).
Processing took 332.5 cpu seconds or 333 seconds wall clock time.
Links before 0
Links after 164860
The PSMs file (link_124_to_122.psms
) is 61 Mb. The decoys file (link_124_to_122.psms.decoys
) is very small and I'm unsure why. They are usually only a little smaller than the PSMs files.
PSMId score q-value posterior_error_prob peptide proteinIds
193678_868.657532_51.7893143_1799528_1 0.136907 0.000793021 0.00403301 A.154146_868.657471_51.7787857_0_1.A
68976_1376.31238_61.6550255_702506.75_1 0.130283 0.00115075 0.00477215 A.68491_1376.31226_61.6410866_1197500.75_1.A
67510_1271.77832_78.6169662_59034.7578_1 0.129991 0.00152964 0.00480767 A.66934_1271.77844_78.6167526_109898.078_1.A
52868_861.40033_55.5789948_1105606.63_4 0.129954 0.00178444 0.00481215 A.106039_861.40033_55.5467949_0_4.A
25607_619.709656_79.7467804_293563.188_3 0.129224 0.00203943 0.00490228 A.140139_619.709656_79.7593689_0_3.A
Advice gratefully appreciated!
Andrew
That's very strange indeed, according to the logs there should be 706656 lines in the decoy file. Could it be that you're running out of disk space?
Great guess. Absolutely right. Should have checked, feeling sheepish!
Over the weekend the rerun (now 16 Gb of RAM allocated to dinosaur) hit a snag with an Array Index error. I'm sure it's not your fault and I'll rerun, in the hope it doesn't recur. I think there are still about another 20-30 links to process.
Best wishes,
Andrew