lazear / sage

Proteomics search & quantification so fast that it feels like magic
https://sage-docs.vercel.app
MIT License
201 stars 38 forks source link

Error: unknown variant `tmt` #60

Closed lgatto closed 1 year ago

lgatto commented 1 year ago

Running the latest version from Github (version 0.10.0 based on the ChangeLog), I get the following error when performing TMT quantitation:

Error: Error("unknown variant `tmt`, expected one of `Tmt6`, `Tmt10`, `Tmt11`, `Tmt16`, `Tmt18`, `User`", line: 19, column: 13)

The exact same config file using sage-0.8.1 works. Here's the output:


$ ~/bin/sage/sage-0.8.1/target/release/sage ../extdata/tmt2.json
[2023-04-04T14:12:30Z INFO  sage] generated 17153170 fragments in 3800ms
[2023-04-04T14:12:30Z INFO  sage] processing files 0 .. 4 
[2023-04-04T14:12:44Z INFO  sage]  - file IO:     4367 ms
[2023-04-04T14:12:44Z INFO  sage]  - search:      9699 ms (157586 spectra)
[2023-04-04T14:12:44Z INFO  sage] processing files 4 .. 8 
[2023-04-04T14:12:58Z INFO  sage]  - file IO:     4379 ms
[2023-04-04T14:12:58Z INFO  sage]  - search:      9562 ms (158463 spectra)
[2023-04-04T14:12:58Z INFO  sage] processing files 8 .. 12 
[2023-04-04T14:13:12Z INFO  sage]  - file IO:     4272 ms
[2023-04-04T14:13:12Z INFO  sage]  - search:      8993 ms (154619 spectra)
[2023-04-04T14:13:15Z INFO  sage] discovered 95065 peptide-spectrum matches at 1% FDR
[2023-04-04T14:13:15Z INFO  sage] discovered 75903 peptides at 1% FDR
[2023-04-04T14:13:15Z INFO  sage] discovered 10390 proteins at 1% FDR
[2023-04-04T14:13:17Z INFO  sage] finished in 50s
{
  "database": {
    "bucket_size": 16384,
    "enzyme": {
      "missed_cleavages": 0,
      "min_len": 5,
      "max_len": 50,
      "cleave_at": "KR",
      "restrict": "P"
    },
    "fragment_min_mz": 150.0,
    "fragment_max_mz": 1500.0,
    "peptide_min_mass": 500.0,
    "peptide_max_mass": 5000.0,
    "min_ion_index": 2,
    "static_mods": {
      "C": 57.0215,
      "^": 229.1629,
      "K": 229.1629
    },
    "variable_mods": {},
    "max_variable_mods": 2,
    "decoy_tag": "rev_",
    "generate_decoys": true,
    "fasta": "/mnt/isilon/CBIO/data/SCPCBIO/fasta/UP000005640_9606.fasta"
  },
  "quant": {
    "tmt": "Tmt11",
    "tmt_level": 2,
    "lfq": null
  },
  "precursor_tol": {
    "ppm": [
      -20.0,
      20.0
    ]
  },
  "fragment_tol": {
    "ppm": [
      -10.0,
      10.0
    ]
  },
  "isotope_errors": [
    -1,
    3
  ],
  "deisotope": true,
  "chimera": true,
  "min_peaks": 15,
  "max_peaks": 150,
  "max_fragment_charge": 1,
  "report_psms": 1,
  "predict_rt": true,
  "parallel": true,
  "mzml_paths": [
    "/home/lgatto/.cache/R/rpx/3ff35e23e0b2c0_3ff35e6e68a154_dq_00082_11cell_90min_hrMS2_A1.mzML",
    "/home/lgatto/.cache/R/rpx/3ff35e23e0b2c0_3ff35e32eb78de_dq_00083_11cell_90min_hrMS2_A3.mzML",
    "/home/lgatto/.cache/R/rpx/3ff35e23e0b2c0_3ff35ee5d5a37_dq_00084_11cell_90min_hrMS2_A5.mzML",
    "/home/lgatto/.cache/R/rpx/3ff35e23e0b2c0_3ff35e3fb19375_dq_00085_11cell_90min_hrMS2_A7.mzML",
    "/home/lgatto/.cache/R/rpx/3ff35e23e0b2c0_3ff35e790c7b57_dq_00086_11cell_90min_hrMS2_A9.mzML",
    "/home/lgatto/.cache/R/rpx/3ff35e23e0b2c0_3ff35e72d009e6_dq_00087_11cell_90min_hrMS2_A11.mzML",
    "/home/lgatto/.cache/R/rpx/3ff35e23e0b2c0_3ff35e50a7dd0c_dq_00088_11cell_90min_hrMS2_B1.mzML",
    "/home/lgatto/.cache/R/rpx/3ff35e23e0b2c0_3ff35e21356dc_dq_00089_11cell_90min_hrMS2_B3.mzML",
    "/home/lgatto/.cache/R/rpx/3ff35e23e0b2c0_3ff35e7c19e7c6_dq_00090_11cell_90min_hrMS2_B5.mzML",
    "/home/lgatto/.cache/R/rpx/3ff35e23e0b2c0_3ff35e78048db8_dq_00091_11cell_90min_hrMS2_B7.mzML",
    "/home/lgatto/.cache/R/rpx/3ff35e23e0b2c0_3ff35e484bf56b_dq_00092_11cell_90min_hrMS2_B9.mzML",
    "/home/lgatto/.cache/R/rpx/3ff35e23e0b2c0_3ff35e127a1e80_dq_00093_11cell_90min_hrMS2_B11.mzML"
  ],
  "output_paths": [
    "output2/results.sage.tsv",
    "output2/quant.tsv",
    "output2/results.json"
  ]
}
lazear commented 1 year ago

Weird - can you paste the "quant" section of the config file (or the full file)? I have searched several 1000 files using release-v10 and TMT, there shouldn't be any breaking changes between these versions regarding parsing of this section of the config file.

lgatto commented 1 year ago

The json file was generated from tmt.json: I loaded it into R, modified it, and serialised back to get the file below:

{
  "database": {
    "bucket_size": 16384,
    "fragment_min_mz": 150,
    "fragment_max_mz": 1500,
    "peptide_min_len": 5,
    "peptide_max_len": 50,
    "min_ion_index": 2,
    "missed_cleavages": 1,
    "static_mods": {
      "^": 229.1629,
      "K": 229.1629,
      "C": 57.0215
    },
    "decoy_prefix": "rev_",
    "fasta": "/mnt/isilon/CBIO/data/SCPCBIO/fasta/UP000005640_9606.fasta"
  },
  "precursor_tol": {
    "ppm": [-20, 20]
  },
  "fragment_tol": {
    "ppm": [-10, 10]
  },
  "isotope_errors": [-1, 3],
  "report_psms": 1,
  "chimera": true,
  "deisotope": true,
  "output_directory": "output2",
  "max_fragment_charge": 1,
  "mzml_paths": ["dq_00082_11cell_90min_hrMS2_A1.mzML", "dq_00083_11cell_90min_hrMS2_A3.mzML", "dq_00084_11cell_90min_hrMS2_A5.mzML", "dq_00085_11cell_90min_hrMS2_A7.mzML", "dq_00086_11cell_90min_hrMS2_A9.mzML", "dq_00087_11cell_90min_hrMS2_A11.mzML", "dq_00088_11cell_90min_hrMS2_B1.mzML", "dq_00089_11cell_90min_hrMS2_B3.mzML", "dq_00090_11cell_90min_hrMS2_B5.mzML", "dq_00091_11cell_90min_hrMS2_B7.mzML", "dq_00092_11cell_90min_hrMS2_B9.mzML", "dq_00093_11cell_90min_hrMS2_B11.mzML"],
  "quant": {
    "tmt": "Tmt11",
    "tmt_level": 2,
    "sn": true
  }
}

Running

$ ~/dev/sage/target/release/sage ../extdata/tmt2.json

produced the error, while running the following command works:

$ ~/bin/sage/sage-0.8.1/target/release/sage ../extdata/tmt2.json
lazear commented 1 year ago

Very strange. The configuration file parses fine (that's the error) for me with the latest release build downloaded from GH. It also looks completely OK to me. It looks like you are running a locally compiled version - Can you try downloading the release build?

This error doesn't really line up with the config file either: the quant section begins on line 31. Error: Error("unknown variant `tmt`, expected one of `Tmt6`, `Tmt10`, `Tmt11`, `Tmt16`, `Tmt18`, `User`", line: 19, column: 13)

Unrelated, it should be "tmt_sn", not "sn". For this to have any meaning, you'll need to include noise measurements in the mzML files (ThermoRawFileParser has an option for this).

lgatto commented 1 year ago

I'll use the latest release build tomorrow, when back in the office and report back. Thank you for helping out.

lazear commented 1 year ago

Sounds good, let's also try a minimal example of the config file:

{
  "database": {
    "fasta": "/mnt/isilon/CBIO/data/SCPCBIO/fasta/UP000005640_9606.fasta"
  },
  "precursor_tol": {
    "ppm": [-20, 20]
  },
  "fragment_tol": {
    "ppm": [-10, 10]
  },
  "mzml_paths": ["dq_00082_11cell_90min_hrMS2_A1.mzML", "dq_00083_11cell_90min_hrMS2_A3.mzML", "dq_00084_11cell_90min_hrMS2_A5.mzML", "dq_00085_11cell_90min_hrMS2_A7.mzML", "dq_00086_11cell_90min_hrMS2_A9.mzML", "dq_00087_11cell_90min_hrMS2_A11.mzML", "dq_00088_11cell_90min_hrMS2_B1.mzML", "dq_00089_11cell_90min_hrMS2_B3.mzML", "dq_00090_11cell_90min_hrMS2_B5.mzML", "dq_00091_11cell_90min_hrMS2_B7.mzML", "dq_00092_11cell_90min_hrMS2_B9.mzML", "dq_00093_11cell_90min_hrMS2_B11.mzML"],
  "quant": {
    "tmt": "Tmt11"
  }
}
lgatto commented 1 year ago

Indeed, using the latest release and running

~/bin/sage/sage-v0.10.0-x86_64-unknown-linux-gnu/sage ../extdata/tmt2.json

works as expected. Will keep this in mind and always use releases in the future.

Thanks!

lazear commented 1 year ago

Still somewhat concerning - there shouldn't be any changes in the master branch since v0.10 that would affect this! But good to hear that the release build works.

lgatto commented 1 year ago

If it helps, I did the following cargo build --release, as indicated in the README.

A simple explanation could be that I simply forgot to pull the latest commit before building (although I think I did) and ended up with an old version. This might be the case, because I ended up with .pin files when removing the quant config part, and discovered in the CHANGELOG that that's how files were named prior to version 0.6.0. But I do doubt this, as I have build/run it in the past and have never seen the .pin files before.

lazear commented 1 year ago

OK, gotcha - I won't worry about it then :)