mgleeming / synthedia

Create synthetic DIA LC-MS/MS for proteomics experiments
https://synthedia.org
BSD 3-Clause "New" or "Revised" License
11 stars 0 forks source link

scan time settings #41

Closed im281 closed 1 year ago

im281 commented 1 year ago

Hello, I have used the following parameters and expected a 200 hz acquisition rate (5 ms per scan with 250 scans in a 1.25 sec cycle time)

synthedia --use_existing_peptide_file peptide_table.tsv --out_dir D:\synthedia\simulatedfiles\my file_2 --output_label 200hzscanrate --ms1_resolution 30000 --ms2_resolution 30000 --new_run_length 30 --resolution_at 200 --isolation_window 2 --ms1_scan_duration 0.005 --ms2_scan_duration 0.005 --ms1_min_mz 400 --ms1_max_mz 900 --centroid_ms1 --centroid_ms2 --original_run_length 0

However, the mzML file has ony 15k scans in it? Can you please advise?

mgleeming commented 1 year ago

Hi!

Depending on the complexity of your input peptide set, many DIA windows may not contain precursors. By default, these 'empty' spectra are not written to the output file to save space and speed up the simulation. If you add the '--write_empty_spectra' flag, spectra that don't contain any ions will be written and you should see increased numbers of spectra.

Let me know if any problems. Michael

im281 commented 1 year ago

Thanks Michael,I’ll try that and let you know. I do have a library of 50k peptides btw. Can you send me the parameters to get a 200 hz scan rate? Also for a 30 min output file? Looks like the last rt is 15 min in the simulated file.

im281 commented 1 year ago

Can you simulate this? Are these parameters ok? synthedia --prosit human_library --ms1_ppm_error_mean 3 --ms2_ppm_error_mean 5 --ms1_resolution 240000 --ms2_resolution 80000 --resolution_at 200 --ms1_min_mz 400 --ms1_max_mz 900 --ms2_scan_duration 0.005 --isolation_window 2 --write_empty_spectra --rt_peak_fwhm_distribution_mean 6 --original_run_length 0 --new_run_length 30 --rt_buffer 5 --centroid_ms1 --centroid_ms2 --out_dir myoutput --output_label simulated --n_groups 1 --samples_per_group 1 --preview --all

im281 commented 1 year ago

Ran on a 64 core machine and got this error? Any thoughts?

image

im281 commented 1 year ago

Tried those settings ---write_empty_spectra and still get way short. At 200 Hz for a 30 min gradient we should get well north of 400k scans. Also the plots are 20 minutes even though I put max gradient time 30? image

The mzML files don't look right either. Something I'm doing wrong? Also, topview crashes when I switch to 3D mode. You example file works though: image

mgleeming commented 1 year ago

With those parameters above, you're only simulating 200 Hz for the MS2 spectra. The MS1 spectra are being simulated at the default of 0.37 s each which consumes a lot of time. To make a 200 Hz overall simulation, drop the ms1_scan_duration parameter to 0.005. Note that the spectrum resolution settings have no effect when centroid spectra are specified.

Running a similar simulation with the parameters below on a prosit library I have here generates a file of 1912.12 seconds (31.87 min) in length with a total of 382,425 spectra which is correct for 1912 s at 200 Hz. The TOPPView visualisation looks good too. Note that the extra bit of acquisition time prevents peptide from being 'clipped' as it's eluting at the boundaries of the acquisition time.

ynthedia --prosit myPrositLib.csv.short --ms1_ppm_error_mean 3 --ms2_ppm_error_mean 5 --ms1_min_mz 400 --ms1_max_mz 900 --ms2_scan_duration 0.005 --isolation_window 2 --write_empty_spectra --rt_peak_fwhm_distribution_mean 6 --new_run_length 30 --rt_buffer 5 --centroid_ms1 --centroid_ms2 --ms1_scan_duration 0.005

screenshot_1

I'm not sure where the 'banding' comes from in your simulation. Perhaps something to do with the peptide set in the prosit file? If you send it to me, I can have a look...

im281 commented 1 year ago

Thanks! You're probably right and it's related to the prosit file. Note that this is a lot bigger file than your examples simulating around 250k peptides. I'm uploading the file now and it should take about 20 min Here is the onedrive link to download the prosit file:

https://seerbio-my.sharepoint.com/:f:/g/personal/imohtashemi_seer_bio/EokKsbP_Q_5DkyXYt0-7ib0BLOFx_u1BFD12HKV45iobSQ?e=kuwie0

If you can upload a correct mzML file this would be great. I am also running into numpy memory allocation errors even with huge EC2 instance with 244 Gb RAM when I use write_empty_spectra

I also created a new file where I used autort to adjust the iRT and these are more sensible: plasma_9k_lib_prosit_autort

I'd like to simulate three files of 30 min at 80k 30k and 10k resolutions if possible as to not bother you but having some issues

im281 commented 1 year ago

Hi Michael, Any luck or insight on this? The plasma_9k_lib_prosit_autort file should be good. Best regards,

mgleeming commented 1 year ago

Hi,

Here's a file containing my simulation as above (on a small prosit library). I've included the prosit file, mzml file, and output tables as well. https://filesender.aarnet.edu.au/?s=download&token=802f2e19-d282-4450-af9b-c7456160fb62

For the memory error, I've never tried simulating that many ions in a single run. Could you send me the log file so I can take a look where it crashed? A couple of ideas - make sure you're using the latest version since there were some memory improvements in the 1.0.3. Try simulating a smaller prosit library first and check that it goes through OK. Are you simulating profile data? If so, try centroid first and get everything working on that.

Thanks, Michael

im281 commented 1 year ago

Hi Michael, Thank you and I will try your suggestions shortly. I'm simulating centroid data but will try the smaller library first

im281 commented 1 year ago

I tried simulating the same number of peptides for two different instrument configurations: scan rate of around 30Hz scan rate at around 200Hz Oddly, the lower scan rate produced higher ID rates using DIANN. Can I send you the peptide library and you can try it and send me the settings? I might be doing something wrong in the settings?

im281 commented 1 year ago

In addition a 10 amu window DIA gives better results than a 2 AMU window?