smith-chem-wisc / MetaMorpheus

Proteomics search software with integrated calibration, PTM discovery, bottom-up, top-down and LFQ capabilities
MIT License
91 stars 46 forks source link

Spectral library integration #1023

Closed Dmorgen closed 6 years ago

Dmorgen commented 6 years ago

How about hyphenating a spectral library search as first pass before going on to database searching? with so much data available today, it's almost a crime not to combine it into a search pipeline... yet there is none that allow good LFQ and search engine coupled to it...

Cheers, David.

trishorts commented 6 years ago

Delightful idea. We'll huddle the team and discuss. I love it.

rmillikin commented 6 years ago

I'm not too familiar with spectral libraries so I'm going to ask some rather ignorant questions to get started.

  1. Are you asking that we accept spectral libraries as input instead of a database in the first-pass GPTMD search to annotate possible PTMs, followed by a second-pass search with the annotated protein database? If so, what is the advantage of replacing the database search with a spectral library search? Is it to match fragment peak intensities in addition to mass? or is it mainly for DIA searches?
  2. I assume also that you want to be able to output search results as a spectral library? What is a typical pipeline for this, and what should we output? I think you mentioned pepXML before; if we output pepXML, is this easily convertible to a spectral library? Do you use spectraST?
  3. I've very briefly looked into using metabolomics spectral libraries to see if we can do metabolomics in MetaMorpheus, but I only really saw low-resolution MS/MS information. The libraries are also dependent on fragmentation energy, fragmentation type, ionization mode, etc. This is kind of where my ignorance catches up to me. I don't really understand where you go to download the specific type of spectral library you want; is there some repository of spectral libraries somewhere that I can find, like if I'm studying Jurkat cell phosphoproteomics with HCD and 32 normalized collision energy and ESI+?
trishorts commented 6 years ago

https://www.biorxiv.org/content/early/2018/03/07/277822

Comprehensive peptide quantification for data independent acquisition mass spectrometry using chromatogram libraries View ORCID ProfileBrian C Searle, Lindsay K Pino, Jarrett D Egertson, Ying S Ting, Robert T Lawrence, View ORCID ProfileJudit Villen, View ORCID ProfileMichael J MacCoss

Dmorgen commented 6 years ago

Hi,

I think I might be as ignorant :)

  1. my suggestion is to run a spectral library search, followed by GPTMD and regular search on those ions that were not identified by the spectral library searches. I think that for model animals, the precentage of PSMs that have been already ID'd in the past will become very dominant quickly. this should allow more complex GPTMD searches on smaller number of MS2 scans that were not ID'd in the first pass. I think spectral libraries should improve several factors: 1. search speed 2. improve search speed significantly on complex data, such as glycoproteomics and top-down 3. building consensus spectra from multiple IDs that will improve the confidence of the ID (via fragmentation intensity too). All of this is more important when considering larger peptide, glycopeptides and other labile and difficult modifications. I'm actually not interested in DIA.

  2. Ideally, I would like to have an option to concatenate the results to a consensus spectral library. for example, if I search human data, i would like to have the strong IDs incorporated back to the spectral library automatically. Currently only SpectraST does it and not automatically, but via command line. I am playing a bit currently with SpectraST, but it is not very easy for me, mainly since the generation of libraries is difficult, performing the search (and FDR) is difficult and data visualization is difficult... I was actually trying to generate libraries for analysis in PD2.2, combining MSPepSearch with MSAmandata for sequential analysis. doesn't work (yet...). The output of interest, assuming you can combine it into the FDR and quantification pipeline, is both TSV and a format that can be viewed somehow (another sore point, I know...). My interest in pepXML was mainly since it can be view in Scaffold and Batmass (http://www.batmass.org/tutorial/overlay-peptide-ids-on-map2d/). If you plan on making a viewer of your own... :) i would say that the lack of a viewing option is a deterrent to some degree, especially with mods.

  3. you can check with NIST: http://chemdata.nist.gov/dokuwiki/doku.php?id=peptidew:cdownload, and divided between instruments and organisms. I think most of the Orbitrap HCD data should be HiHi, but I'm not sure.

rmillikin commented 6 years ago

I like this idea and proteomics spectral library searching may be an intermediate stage on the way to MetaMorpheus-metabolomics. To be honest it will not likely happen soon, because we're focusing on fixing crashes/bugs and other stability issues, followed by making MM more user-friendly, in addition to our other non-MetaMorpheus projects and this is a fairly major change. It is a very interesting new feature, though. I'm especially interested in matching fragment intensities, though again this is dependent on fragmentation type (e.g., HCD vs CID), collision energy, etc. At the very least we can aid spectral library generation through outputting pepXML. I'm hoping to have that done relatively soon (1-2 months).

rmillikin commented 6 years ago

see #1124