about data for PMDDA workflow

wushaowen1992 commented 2 years ago

Hi yufree, I am still trying to repeat your PMDDA workflow. And I have some question about the data used in this analysis. For the raw data of samples, nist1950, you have five mzML files (for description I name them mzML for precursor ions) for both positive and negative modes, and those files were used to extract the precursor ion list (as I understand, MS1 list). After generating precursor ions peaks lists using different methods, such as PMDDA, CAMERA, and RamClustR, the PMDDA workflow go to a step named "MS2 data collection", and multiple mzML files (for description I name them mzML for MS2 data collection) were download from the links that your provide in the script. And, what I don't understand is, how are these mzML files for MS2 data collection obtained? do they come from some data extraction with the original mzML files (mzML for precursor ions) and the generated precursor ions peaks lists? If so, how can I do it by myself? or do mzML for precursor ions and mzML for MS2 data collection from different LC-MS measurements?

After that, the analysis goes to GNPS molecular networking comparison step, my question is, which website should I use for GNPS molecular networking analysis? and which files need to use as input files?

I just want to understand your workflow fully. Thank you very much for your kind help again.

yufree commented 2 years ago

Hi, the MS2 files are generated from MS1 precursor lists by applying precursor ion selection mode of certain LC-MS/MS instrument. In this demonstration, I used a LC-qToF to collect MS1 full scan data and generated the precursors list for multiple injections for MS2 data collection. Multiple injections are used to ensure the quality of MS2 spectra. Then I still used LC-qToF to collected MS2 fragmental ions data with the precursors lists from MS1. You can treat this process as a targeted analysis while we don't know the information of the precursor ions. You can also use LC-qqq for MS2 data collection depending on your instrument resources or database(high or low resolution).

GNPS website is here: https://gnps.ucsd.edu/ProteoSAFe/static/gnps-splash.jsp You need to register a free account and they only accept MS2 data (both high and low resolution) for identification purpose. For PMDDA, you will need to upload your MS2 data from precursor ions selection mode. You could check their documents here: https://ccms-ucsd.github.io/GNPSDocumentation/networking/

wushaowen1992 commented 2 years ago

So, do you collect MS2 raw data from LC-MS after you obtain the precursors lists? in other words, do a new experiment? I discussed with our technician and learned that we are using an Orbitrap system to collect the LC-MS raw data, which usually collect MS1 and MS2 data simultaneously, and stored in one .raw file. I could convert this .raw file to mzML file with ProteoWizard. In this case, how could I obtain the precursors lists and the corresponding mzML file of MS2, which will be used to GNPS? Thank you again.

yufree commented 2 years ago

It seems your lab used a regular metabolomics workflow to collect DDA/DIA, which is not preferred for a quantitative analysis. You might check the PMDDA publication to know why I preferred a two-step data collection workflow. In brief, I preferred to take advantages of both MS1 full scan for quantitative analysis and pseudo targeted MS2 for qualitative analysis. If you collected MS1 and MS2 at the same time, you will lose the precursor coverage of MS1 and have limited control of MS2 precursor selection process.

You could directly upload your MS1-MS2 mzML files to GNPS for annotation, which is a regular route. However, you need to link your MS1 full scan quantitative results and MS2 annotation results yourself and you will find most of the DDA/DIA annotation results without associated MS1 full scan data. If you didn't collect MS1 full scan and only collect DDA/DIA results for quantitative analysis or discussion, I will not be convinced by the results as a reviewer.

If you use commercial software like compounds discover, you could ignore my comments as I have no idea about their data process details and they are designed for customers instead of researchers.

wushaowen1992 commented 2 years ago

I see. Thank you very much for this detailed explanation.

yufree / xcmsrocker

about data for PMDDA workflow #4