lazear / sage

Proteomics search & quantification so fast that it feels like magic
https://sage-docs.vercel.app
MIT License
201 stars 38 forks source link

.pin file not opening in percolator #107

Closed jordancurrie closed 7 months ago

jordancurrie commented 7 months ago

Hey there,

Thanks for creating sage!

I'm having an issue processing files in percolator. The error message is as follows: "Unhandled Exception: System.AccessViolationException: Attempted to read or write protected memory. This is often an indication that other memory is corrupt. at mainCRTStartup()"

I'm able to run other .pin files so it seems like the issue lies somewhere in the .pin files I'm generating in sage. I can also reproduce this error if I generate a .pin from the test file included in the sage download. I'm running Version 0.14.4.

Also feel free to let me know if you think I should try the folks percolator instead.

lazear commented 7 months ago

Sounds like a memory safety issue with percolator... not the first time! See: https://github.com/percolator/percolator/issues/352

I think previous issues might have been from no decoys? Perhaps check your pin files. If they have decoys, can you upload your pin file to the issue above? Even if it's an issue with Sage output, percolator still has a memory access violation that should be fixed.

You can also try mokapot, which is very similar to percolator and should work with Sage results.

jordancurrie commented 7 months ago

sage_test.zip Thanks for looking at it! Decoys are present and they seem to get the correct -1 label. Attaching the results tsv and pin. I'll see if I can try mokapot as well. Thanks!

lazear commented 7 months ago

I'm able to successfully process it with both mokapot (v0.9.1) and percolator (v3.06.2)

Protein decoy-prefix used is DECOY_
All files have been read
Percolator version 3.06.2, Build Date Oct  5 2023 15:32:18
Copyright (c) 2006-9 University of Washington. All rights reserved.
Written by Lukas Käll (lukall@u.washington.edu) in the
Department of Genome Sciences at the University of Washington.
Issued command:
C:\Program Files\percolator-v3-06\bin\percolator.exe .\results.sage.pin
Started Mon Dec  4 12:33:03 2023
Hyperparameters: selectionFdr=0.01, Cpos=0, Cneg=0, maxNiter=10
Reading tab-delimited input from datafile .\results.sage.pin
Features:
rank z=2 z=3 z=4 z=5 z=6 z=other peptide_len missed_cleavages isotope_error ln(precursor_ppm) fragment_ppm ln(hyperscore) ln(delta_next) ln(delta_best) aligned_rt predicted_rt sqrt(delta_rt_model) matched_peaks longest_b longest_y longest_y_pct ln(matched_intensity_pct) scored_candidates ln(-poisson)
Found 14558 PSMs
Concatenated search input detected, skipping both target-decoy competition and mix-max.
Train/test set contains 8624 positives and 5934 negatives, size ratio=1.45332 and pi0=1
Selecting Cpos by cross-validation.
Selecting Cneg by cross-validation.
Split 1:        Selected feature 25 as initial direction. Could separate 1267 training set positives with q<0.01 in that direction.
Split 2:        Selected feature 25 as initial direction. Could separate 1246 training set positives with q<0.01 in that direction.
Split 3:        Selected feature 25 as initial direction. Could separate 1244 training set positives with q<0.01 in that direction.
Found 1868 test set positives with q<0.01 in initial direction
Reading in data and feature calculation took 0.1500 cpu seconds or 0 seconds wall clock time.
---Training with Cpos selected by cross validation, Cneg selected by cross validation, initial_fdr=0.01, fdr=0.01
Iteration 1:    Estimated 2136 PSMs with q<0.01
Iteration 2:    Estimated 2204 PSMs with q<0.01
Iteration 3:    Estimated 2241 PSMs with q<0.01
Iteration 4:    Estimated 2241 PSMs with q<0.01
Iteration 5:    Estimated 2263 PSMs with q<0.01
Iteration 6:    Estimated 2274 PSMs with q<0.01
Iteration 7:    Estimated 2261 PSMs with q<0.01
Iteration 8:    Estimated 2273 PSMs with q<0.01
Iteration 9:    Estimated 2264 PSMs with q<0.01
Iteration 10:   Estimated 2266 PSMs with q<0.01
Learned normalized SVM weights for the 3 cross-validation splits:
 Split1  Split2  Split3 FeatureName
 0.0000  0.0000  0.0000 rank
-0.1048 -0.1332 -0.1221 z=2
 0.0669  0.0699  0.0693 z=3
 0.0440  0.1227  0.0596 z=4
 0.1101  0.0391  0.1535 z=5
 0.0000  0.0000  0.0000 z=6
 0.0000  0.0000  0.0000 z=other
-0.2547 -0.1515 -0.3051 peptide_len
-0.3281 -0.3915 -0.4159 missed_cleavages
 0.0669  0.0238 -0.0752 isotope_error
-0.1572 -0.4282 -0.2577 ln(precursor_ppm)
-0.6218 -0.5819 -1.0130 fragment_ppm
 0.2839  0.2536  0.0699 ln(hyperscore)
 0.6633  0.4595  1.0129 ln(delta_next)
 0.0000  0.0000  0.0000 ln(delta_best)
 0.3476  0.1913  0.3697 aligned_rt
-0.4516 -0.1200 -0.2104 predicted_rt
-0.3902 -0.3369 -0.4974 sqrt(delta_rt_model)
 0.2774  0.5597  1.0114 matched_peaks
-0.1156 -0.1926 -0.4122 longest_b
 0.5850  0.2864  0.6289 longest_y
-0.0271  0.3568  0.3482 longest_y_pct
 0.0116 -0.1743 -0.0713 ln(matched_intensity_pct)
 0.0523 -0.0754  0.1222 scored_candidates
 0.5985  0.6356  0.7988 ln(-poisson)
-2.4482 -1.8603 -3.2667 m0
Found 2175 test set PSMs with q<0.01.
Tossing out "redundant" PSMs keeping only the best scoring PSM for each unique peptide.
Calculating q values.
Final list yields 1549 target peptides with q<0.01.
Calculating posterior error probabilities (PEPs).
Processing took 2.8370 cpu seconds or 3 seconds wall clock time.
jordancurrie commented 7 months ago

Got it! The issue is me not realizing the difference in crux percolator and having standalone percolator. With a fresh percolator install it's working. So there may be an issue with crux percolator but that seems like a totally different beast. I'd say this is resolved and I'll move away from crux for now. Thanks for your help!