mgleeming / synthedia

Create synthetic DIA LC-MS/MS for proteomics experiments
https://synthedia.org
BSD 3-Clause "New" or "Revised" License
11 stars 0 forks source link

.mzML file crashes in FragPipe DIA analysis #37

Closed johawahn closed 1 year ago

johawahn commented 1 year ago

Hello!

I used a prosit spectral library file to produce an mzML file using Synthedia (synthedia --prosit 'prosit_lib.csv' --centroid_ms1 --centroid_ms2). This works ok, but when I try to analyze the mzML file using FragPipe it crashes during the MS-fragger analysis. Have you ever tried analyzing the file using this software? or only with DIA-NN? Could it be because the way the mzML file is structured there is a compatibility problem with the software? I have attached the Synthedia produced mzML file. (I have also tried adding a decoy file as well as an acquisition schema table without any success in the analysis)

Thank you for your help! Johanna

mgleeming commented 1 year ago

Hi Johanna,

Thanks for your note!

Yes, I have been able to run the synthetic mzML files using FragPipe and MsFragger without error. Sorry - I can't see any mzML file attached.

It may be some issue with the FragPipe configuration. Could you post the processing log? Remember to activate the DIA-Umpire node which is require to assemble the pseudo-MS/MS spectra.

Thanks, Michael

johawahn commented 1 year ago

Hello Michael,

I got in contact with the FragPipe team as well, apparently my data is "too perfect" causing the MSFragger crash. I have attached the .mzML file and the assembly log. Do you have any advice on how I could up the noise on the file?

output_group_0_sample_0_DIA. [output_assembly.log](https://github.com/mgleeming/synthedia/files/10741368/output_assembly.log) zip

Thank you so much for your help, Johanna

mgleeming commented 1 year ago

Hi Johanna,

Thanks for sending through the files. I can see that you've simulated 662 precursors over about 3.5 h. In our testing, we did find that FragPipe didn't like it when too few precursors were present or when they were too spread out. We've successfully run hundreds of Umpire/Fragger analyses but all were with >30,000 precursors and used gradients of <2h.

You might need to add more precursors and compress the run time using the --new_run_length parameter which specifies the 'gradient' length in minutes.

Here's an example file that I've got to work just now that simulates 10,000 precursors over 30 mins. Maybe give it a try and see what happens :)

Thanks, Michael