vdemichev / DiaNN

DIA-NN - a universal automated software suite for DIA proteomics data analysis.
Other
261 stars 53 forks source link

Diann doesn’t load some of Astral raw files #1143

Open weixiandeng opened 3 weeks ago

weixiandeng commented 3 weeks ago

Hi Vadim,

I’ve been running into this issue, raw files acquired from Orbitrap Astral are not loaded by Diann, no matter it’s 1.8 or 1.9 version/1.9.1 of diann. It happens occasionally, not for all the raw files. And it doesn’t generate error message, just finishes program without any output. It could be solved by converting those files to mzml format. Wondering what’s the reason for it and whether there will be a fix for this issue?

best, Weixian

vdemichev commented 3 weeks ago

Hi Weixian,

There was one bug in a library used by DIA-NN reported that was causing this with Astral, this will be fixed in 1.9.2 release soon. I guess what you observe might be this bug.

Best, Vadim

weixiandeng commented 3 weeks ago

Thank you Vadim for your prompt response! Looking forward to the fix.

Weixian

On Fri, Aug 23, 2024 at 12:31 PM Vadim Demichev @.***> wrote:

Hi Weixian,

There was one bug in a library used by DIA-NN reported that was causing this with Astral, this will be fixed in 1.9.2 release soon. I guess what you observe might be this bug.

Best, Vadim

— Reply to this email directly, view it on GitHub https://github.com/vdemichev/DiaNN/issues/1143#issuecomment-2307694142, or unsubscribe https://github.com/notifications/unsubscribe-auth/AG75FEQ65632GGNCVVBZICTZS6EY3AVCNFSM6AAAAABNAW5MC2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMBXGY4TIMJUGI . You are receiving this because you authored the thread.Message ID: @.***>

KentirLemu commented 2 weeks ago

Hi Vadim,

I have the same problem here with Astral DIA data, and converting to mzml usually fixes the issue. However, I've been trying to search 2 different data sets, because of time constraint, I searched it in parallel with .RAW format. The search failed and I tried searching the mzml format, but also failed. I also tried to search it in series instead of parallel and still failed.

Do you know what might causing this issue, and if so how can I make sure the search won't fail in the future? Here is the log (converted to mzml and searched in series) of my last data search attempt (FYI the data were downloaded from our Ardia platform and saved locally before the data search):

diann.exe --f "C:\Users\Thermo\Downloads\20240823_69b.mzML " --f "C:\Users\Thermo\Downloads\20240823_68a.mzML " --f "C:\Users\Thermo\Downloads\20240823_68b.mzML " --f "C:\Users\Thermo\Downloads\20240823_PBS.mzML " --f "C:\Users\Thermo\Downloads\20240823_69a.mzML " --lib "E:\Spectral Library\Mouse_Library.predicted.speclib" --threads 56 --verbose 1 --out "E:\MZML_23Aug\report.tsv" --qvalue 0.01 --matrices --predictor --var-mods 1 --var-mod UniMod:35,15.994915,M --reanalyse --relaxed-prot-inf --smart-profiling --peak-center --no-ifs-removal DIA-NN 1.8.1 (Data-Independent Acquisition by Neural Networks) Compiled on Apr 14 2022 15:31:19 Current date and time: Wed Aug 28 17:15:55 2024 CPU: GenuineIntel Intel(R) Xeon(R) Gold 6258R CPU @ 2.70GHz SIMD instructions: AVX AVX2 AVX512CD AVX512F FMA SSE4.1 SSE4.2 Logical CPU cores: 56 Thread number set to 56 Output will be filtered at 0.01 FDR Precursor/protein x samples expression level matrices will be saved along with the main report Deep learning will be used to generate a new in silico spectral library from peptides provided Maximum number of variable modifications set to 1 Modification UniMod:35 with mass delta 15.9949 at M will be considered as variable A spectral library will be created from the DIA runs and used to reanalyse them; .quant files will only be saved to disk during the first step Highly heuristic protein grouping will be used, to reduce the number of protein groups obtained; this mode is recommended for benchmarking protein ID numbers; use with caution for anything else When generating a spectral library, in silico predicted spectra will be retained if deemed more reliable than experimental ones Fixed-width center of each elution peak will be used for quantification Interference removal from fragment elution curves disabled DIA-NN will optimise the mass accuracy automatically using the first run in the experiment. This is useful primarily for quick initial analyses, when it is not yet known which mass accuracy setting works best for a particular acquisition scheme.

5 files will be processed [0:00] Loading spectral library E:\Spectral Library\Mouse_Library.predicted.speclib [0:06] Library annotated with sequence database(s): E:\Spectral Library\uniprotkb_Mouse_Mus_musculus.fasta [0:06] Protein names missing for some isoforms [0:06] Gene names missing for some isoforms [0:06] Library contains 17265 proteins, and 16873 genes [0:07] Spectral library loaded: 17265 protein isoforms, 22654 protein groups and 4946466 precursors in 1538977 elution groups. [0:07] Encoding peptides for spectra and RTs prediction [0:17] Predicting spectra and IMs [17:14] Predicting RTs [21:22] Decoding predicted spectra and IMs [21:27] Decoding RTs [21:33] Saving the library to lib.predicted.speclib [21:40] Initialising library

[21:45] First pass: generating a spectral library from DIA data [21:45] File #1/5 [21:45] Loading run C:\Users\Thermo\Downloads\20240823_69b.mzML [25:52] 3061466 library precursors are potentially detectable [25:52] Processing... [35:36] RT window set to 0.994562 [35:36] Peak width: 3.688 [35:36] Scan window radius set to 8 [35:36] Recommended MS1 mass accuracy setting: 4.43896 ppm [46:30] Optimised mass accuracy: 11.6512 ppm [49:28] Removing low confidence identifications [49:28] Removing interfering precursors [49:30] Training neural networks: 11882 targets, 6501 decoys [49:31] Number of IDs at 0.01 FDR: 3386 [49:31] Calculating protein q-values

DIA-NN exited DIA-NN-plotter.exe "E:\MZML_23Aug\report.stats.tsv" "E:\MZML_23Aug\report.tsv" "E:\MZML_23Aug\report.pdf" PDF report will be generated in the background

diann.exe --f "C:\Users\Thermo\Downloads\20240826_69b.mzML " --f "C:\Users\Thermo\Downloads\20240826_68b.mzML " --f "C:\Users\Thermo\Downloads\20240826_68a.mzML " --f "C:\Users\Thermo\Downloads\20240826_69a.mzML " --f "C:\Users\Thermo\Downloads\20240826_PBS.mzML " --lib "E:\Spectral Library\Mouse_Library.predicted.speclib" --threads 56 --verbose 1 --out "E:\MZML_26Aug\report.tsv" --qvalue 0.01 --matrices --predictor --var-mods 1 --var-mod UniMod:35,15.994915,M --reanalyse --relaxed-prot-inf --smart-profiling --peak-center --no-ifs-removal DIA-NN 1.8.1 (Data-Independent Acquisition by Neural Networks) Compiled on Apr 14 2022 15:31:19 Current date and time: Wed Aug 28 18:05:54 2024 CPU: GenuineIntel Intel(R) Xeon(R) Gold 6258R CPU @ 2.70GHz SIMD instructions: AVX AVX2 AVX512CD AVX512F FMA SSE4.1 SSE4.2 Logical CPU cores: 56 Thread number set to 56 Output will be filtered at 0.01 FDR Precursor/protein x samples expression level matrices will be saved along with the main report Deep learning will be used to generate a new in silico spectral library from peptides provided Maximum number of variable modifications set to 1 Modification UniMod:35 with mass delta 15.9949 at M will be considered as variable A spectral library will be created from the DIA runs and used to reanalyse them; .quant files will only be saved to disk during the first step Highly heuristic protein grouping will be used, to reduce the number of protein groups obtained; this mode is recommended for benchmarking protein ID numbers; use with caution for anything else When generating a spectral library, in silico predicted spectra will be retained if deemed more reliable than experimental ones Fixed-width center of each elution peak will be used for quantification Interference removal from fragment elution curves disabled DIA-NN will optimise the mass accuracy automatically using the first run in the experiment. This is useful primarily for quick initial analyses, when it is not yet known which mass accuracy setting works best for a particular acquisition scheme.

5 files will be processed [0:00] Loading spectral library E:\Spectral Library\Mouse_Library.predicted.speclib [0:06] Library annotated with sequence database(s): E:\Spectral Library\uniprotkb_Mouse_Mus_musculus.fasta [0:06] Protein names missing for some isoforms [0:06] Gene names missing for some isoforms [0:06] Library contains 17265 proteins, and 16873 genes [0:07] Spectral library loaded: 17265 protein isoforms, 22654 protein groups and 4946466 precursors in 1538977 elution groups. [0:07] Encoding peptides for spectra and RTs prediction [0:18] Predicting spectra and IMs [17:05] Predicting RTs [21:28] Decoding predicted spectra and IMs [21:34] Decoding RTs [21:40] Saving the library to lib.predicted.speclib [21:46] Initialising library

[21:51] First pass: generating a spectral library from DIA data [21:51] File #1/5 [21:51] Loading run C:\Users\Thermo\Downloads\20240826_69b.mzML [25:37] 3061466 library precursors are potentially detectable [25:37] Processing... [34:58] RT window set to 1.00154 [34:58] Peak width: 3.74 [34:58] Scan window radius set to 8 [34:58] Recommended MS1 mass accuracy setting: 4.406 ppm [45:20] Optimised mass accuracy: 10.1147 ppm [47:34] Removing low confidence identifications [47:34] Removing interfering precursors [47:35] Training neural networks: 12038 targets, 6676 decoys [47:36] Number of IDs at 0.01 FDR: 2611 [47:36] Calculating protein q-values

DIA-NN exited DIA-NN-plotter.exe "E:\MZML_26Aug\report.stats.tsv" "E:\MZML_26Aug\report.tsv" "E:\MZML_26Aug\report.pdf" PDF report will be generated in the background

vdemichev commented 2 weeks ago

Hi,

Can you please try with 1.9.1? If the problem still occurrs with 1.9.1 (unlikely), I would be grateful if you could share the raw file and the .predicted.speclib, and I would then take a look.

Best, Vadim

KentirLemu commented 2 weeks ago

Hi Vadim,

Didn't see your message, I tried ver 1.9 instead(since the 1.9.1 has the "pre-release" label). Seems working fine for this set of samples, but I can't find the "Unique Protein" bar chart on the PDF report. Is this feature removed for this ver 1.9?