Closed gsaxena888 closed 1 year ago
Without having the mgf files, I can't say - but assuming they produce valid mzMLs, Sage should be able to handle them. I have successfully converted mzXMLs to mzMLs and searched them. If you find that they don't work, please let me know and I will push out a fix.
Also, you can directly search DIA data (no DIA-specific quant yet though) with Sage 😃 - I have successfully searched data from TTOFs, Orbitraps, and even the Astral. Be warned that performance may not be up to par with other tools at this time, since it's not specifically designed for searching DIA data
Something like the following params typically works well
{
"chimera": true,
"wide_window": true,
"max_fragment_charge": 2
"report_psms": 5
}
So I converted a portion of an mgf file to mzML using msconvert, but when I tried to run it through sage, it errored out. (Note: I believe a similar mgf to mzML conversion that I did years ago for msfragger worked fine; also, the error message from sage said that there was no ms1 info, but I did notice some basic ms1 info in the mZML file). Here is the small mgf, the convertered mzML (via msconvert running on Linux), and the fasta and simple config files:
<deleted attachment; see next comment>
And the full error message was:
bash-5.1# sage config2.json
[2023-07-18T20:58:16Z INFO sage] generated 722 fragments in 344ms
thread '<unnamed>' panicked at 'missing precursor information for MS2 scan, please check input files!', src/spectrum.rs:220:14
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Aborted (core dumped
Thoughts?
Please use this attachment and DELETE the previous one: allFiles.tar.gz @lazear
I am able to search that mzML fine. Which version of Sage are you using?
#!/bin/bash
wget https://github.com/lazear/sage/releases/download/v0.13.3/sage-v0.13.3-x86_64-unknown-linux-gnu.tar.gz
wget https://github.com/lazear/sage/files/12089000/allFiles.tar.gz
tar xvf allFiles.tar.gz
tar xvf sage-v0.13.3-x86_64-unknown-linux-gnu.tar.gz
SAGE_LOG=trace ./sage-v0.13.3-x86_64-unknown-linux-gnu/sage config2.json -f fasta_with_decoy.fasta input.mzML
[2023-07-18T22:52:28Z TRACE sage_core::database] modifying peptides
[2023-07-18T22:52:31Z TRACE sage_core::database] sorting and deduplicating peptides
[2023-07-18T22:52:32Z TRACE sage_core::database] generating fragments
[2023-07-18T22:52:32Z TRACE sage_core::database] finalizing index
[2023-07-18T22:52:33Z INFO sage] generated 104076437 fragments, 6153956 peptides in 5711ms
[2023-07-18T22:52:33Z INFO sage] processing files 0 .. 1
[2023-07-18T22:52:33Z TRACE sage] - input.mzML: read 476 spectra
[2023-07-18T22:52:33Z INFO sage] - file IO: 50 ms
[2023-07-18T22:52:33Z INFO sage] - search: 65 ms (476 spectra)
[2023-07-18T22:52:33Z INFO sage_core::ml::retention_alignment] aligning file #0: y = 1.0000x + 0.0000
[2023-07-18T22:52:33Z INFO sage_core::ml::retention_alignment] aligned retention times across 1 files
[2023-07-18T22:52:33Z INFO sage_core::ml::retention_model] - fit retention time model, rsq = NaN
[2023-07-18T22:52:33Z TRACE sage_core::ml::linear_discriminant] fitting linear discriminant model...
[2023-07-18T22:52:33Z TRACE sage_core::ml::linear_discriminant] - linear model fit with {"rank": -0.0, "charge": -0.008937116197891792, "ln1p(hyperscore)": 0.02537321405895736, "ln1p(delta_next)": 0.0008479605715541809, "ln1p(delta_best)": -0.0, "delta_mass_model": -0.2984225949733574, "isotope_error": -0.0027129223649496534, "average_ppm": -0.005395587524408603, "ln1p(-poisson)": -0.04070435342311652, "ln1p(matched_intensity_pct)": 0.0007503495236815217, "ln1p(matched_peaks)": 0.004289864991880172, "ln1p(longest_b)": 0.004771601232359431, "ln1p(longest_y)": -0.14894784710117093, "longest_y_pct": 0.8432765693783117, "ln1p(peptide_len)": -0.09478725602175102, "missed_cleavages": -0.0008780636293367089, "rt": -0.21546556691038277, "sqrt(delta_rt_model)": 0.3460821778822863}
[2023-07-18T22:52:33Z TRACE sage_core::ml::linear_discriminant] - fitting non-parametric model for posterior error probabilities
[2023-07-18T22:52:39Z INFO sage] discovered 0 target peptide-spectrum matches at 1% FDR
[2023-07-18T22:52:39Z INFO sage] discovered 0 target peptides at 1% FDR
[2023-07-18T22:52:39Z INFO sage] discovered 0 target proteins at 1% FDR
[2023-07-18T22:52:39Z TRACE sage] writing outputs
{
"version": "0.13.3",
"database": {
"bucket_size": 16384,
"enzyme": {
"missed_cleavages": 1,
"min_len": null,
"max_len": null,
"cleave_at": "KR",
"restrict": "P",
"c_terminal": null
},
"fragment_min_mz": 150.0,
"fragment_max_mz": 1500.0,
"peptide_min_mass": 500.0,
"peptide_max_mass": 5000.0,
"ion_kinds": [
"b",
"y"
],
"min_ion_index": 2,
"static_mods": {
"C": 57.0216
},
"variable_mods": {},
"max_variable_mods": 2,
"decoy_tag": "rev_",
"generate_decoys": true,
"fasta": "fasta_with_decoy.fasta"
},
"quant": {
"tmt": null,
"tmt_settings": {
"level": 3,
"sn": false
},
"lfq": false,
"lfq_settings": {
"peak_scoring": "Hybrid",
"integration": "Sum",
"spectral_angle": 0.7,
"ppm_tolerance": 5.0
}
},
"precursor_tol": {
"ppm": [
-50.0,
50.0
]
},
"fragment_tol": {
"ppm": [
-10.0,
10.0
]
},
"isotope_errors": [
-1,
3
],
"deisotope": true,
"chimera": false,
"wide_window": false,
"min_peaks": 15,
"max_peaks": 150,
"max_fragment_charge": 1,
"min_matched_peaks": 4,
"report_psms": 1,
"predict_rt": true,
"mzml_paths": [
"input.mzML"
],
"output_paths": [
"/mnt/d/Github/sage/issue76/results.sage.tsv",
"/mnt/d/Github/sage/issue76/results.json"
]
}
I was using docker, ie sudo docker pull ghcr.io/lazear/sage:master
Can you share the full setup you used, e.g., volume mounts, and anything else that might help me troubleshoot?
@lazear the non-docker version seems to work fine so far!
That is still somewhat concerning... they should behave identically (unless something is up with the volume mounts etc). I use the docker image via AWS Batch without issues, but admittedly haven't done much testing using docker locally
I have mgf files only (no mzml). I'm thinking of using msconvert (on linux) to convert the mgf to mzml....But I know that some programs don't always work 100% properly when that occurrs (due to some expectaion of what should be in a mzML file etc.) Is there any known/guessed issue that might arise if I take a reguar/simple mgf file and try to convert it to mzML? (The reason for this need: the mgf is a pseudo generated file, and it's generated off of DIA data in a manner similar to how DIAUmpire works.)