lazear / sage

Proteomics search & quantification so fast that it feels like magic
https://sage-docs.vercel.app
MIT License
201 stars 38 forks source link

diaPASEF first tests #123

Open KlemensFroehlich opened 4 months ago

KlemensFroehlich commented 4 months ago

hi Michael

Please forgive me for bringing this up again... In my defense: You said nothing is preventing users from converting .d folders to mzml and analyze diaPASEF data :)

I have tried a few things to see how different softwares currently handly diaPASEF. For this I generated a ~10ng, 5min human HEK active gradient diaPASEF run on a timsTOF Ultra. file size as .d folder: 1.4GB

I have now tried 3 different ways to generate an mzml file:

  1. MSConvert, standard settings, so I think it came down to 4.5 million scan events which led to a final size of 25GB.
  2. MSConvert, Combine ion mobility scans 6.3GB
  3. timsconvert, standard settings 5.7GB

I searched the data with sage and for comparison I also included a 30min ddaPASEF run, which was directly analyzed from .d

image image image

the ddaPASEF seems to work nicely looking at the q value distribution of spectrum, peptide or protein level, but for all other searches the q value distribution looks odd to me.

I would be happy to share the data with you, or do more testing benchmarking. As I already mentioned there is currently no non-proprietary option to analyze diaPASEF data with large search spaces, so I am highly motivated to get this off the ground.

You also said earlier that it is not trivial to collapse / handle the ion mobility in tims data. Would you be open to have a look at the msconvert ion mobility combine option and at the resulting data structure? Maybe this can already solve the problem and sage "only" needs to be adapted to be able to handle this specific mzml ?

Oh and btw working with sage made me smile more than once! It is a lot of fun to work with a tool so fast! I still remember the days I had to run semitryptic searches in MaxQuant for a week xD

Best, Klemens

lazear commented 4 months ago

I did say theoretically possible 😉 - there are clearly some practical issues.

I'm definitely interested in supporting diaPASEF. To be honest though, it will probably be ~2-3 months or so before I have time to really dig in and experiment (I have a lot going on until late spring!). It seems like there are a couple other people interested in perhaps trying it out too, and I am always happy to provide guidance on Sage internals - if there's a way to collapse into a single spectrum, plugging it into Sage should be straightforward.

If you're willing to share the files, I can download them and hack on them when I get a chance!

KlemensFroehlich commented 4 months ago

Thank you so much for even considering this!

please find all data (also a DIA-NN analysis directly from .d) here: SwitchDrive

If supporting diaPASEF is a longterm possibility, may I ask what you are thinking about very roughly (I will not hold you to it I promise expect if you say that I can do something theoretically :P)?

More specifically, I was thinking about whether you would consider reaching out to Vadim Demichev to see whether it is possible to create the library with SAGE and then do the main DIA analysis with DIA-NN. It is not open-source though, so it might not be ideal. But looking at FragPipe, this could be a powerful combination of tools.

Again, if I can help in any way, please let me know! Especially, the files that I provided: If there is any other metric I should check or summarize that might be helpful, I would be happy to do so!

Best, Klemens

prvst commented 3 months ago

Thank you so much for even considering this!

please find all data (also a DIA-NN analysis directly from .d) here: SwitchDrive

If supporting diaPASEF is a longterm possibility, may I ask what you are thinking about very roughly (I will not hold you to it I promise expect if you say that I can do something theoretically :P)?

More specifically, I was thinking about whether you would consider reaching out to Vadim Demichev to see whether it is possible to create the library with SAGE and then do the main DIA analysis with DIA-NN. It is not open-source though, so it might not be ideal. But looking at FragPipe, this could be a powerful combination of tools.

Again, if I can help in any way, please let me know! Especially, the files that I provided: If there is any other metric I should check or summarize that might be helpful, I would be happy to do so!

Best, Klemens

Hello @KlemensFroehlich and @lazear. I would like to point out that the FragPipe DIA workflow uses a program called easypqp (OpenMS) to build the library. The library is then passed down to DIANN to perform the quantification.

If you go to the easypqp GitHub page, you'll see that someone already made this request to them.

Glad to see that people share the same interests and strategies.

Best

lazear commented 3 months ago

Yes, sage will write all matched fragments when using the --annotate-matches CLI flag, so it should be possible to plug into easypqp at some point, or other library building tools!

RobbinBouwmeester commented 3 months ago

@KlemensFroehlich @lazear , could this repo be useful? https://github.com/mafreitas/tdf2mzml, in the description they claim to have support for DIA bruker data (and based on a closed discussion/issue on that repo this concerns diaPASEF).