DIA-NN 1.7.12 on linux throws WARNING: at least one library precursor had non-positive charge; corrected to charge = 2

vidyavenkatraman commented 3 years ago

I am trying to run DIA-NN 1.7.12 on Linux. I centroided my raw file to mzML. After library initialization, i get the WARNING: at least one library precursor had non-positive charge; corrected to charge = 2 and the script makes a core dump file and exits with an empty out.tsv

I have run the raw files and library through windows DIA-NN before and that works as expected. Note: This library was generated using the windows GUI. Can you provide any insights on where I might be going wrong?

DIA-NN 1.7.12 (Data Independent Acquisition by Neural Networks) Compiled on Jul 1 2021 12:25:02 Current date and time: Thu Jul 8 11:55:27 2021 Logical CPU cores: 24 Thread number set to 8 Output will be filtered at 0.01 FDR Precursor/protein x samples expression level matrices will be saved along with the main report A spectral library will be generated

1 files will be processed [0:00] Loading spectral library Saddic_DIANN_Library.predicted.speclib [0:28] Library annotated with sequence database(s): X:\Sarah Parker\FASTA\UNIPROT_MusMusculus-proteome_UP000000589+AND+reviewed_yes_March2021_iRT_DECOY.fasta.txt [0:28] Gene names missing for some isoforms [0:28] Library contains 17048 proteins, and 16670 genes [0:28] Assembling elution groups [0:32] Spectral library loaded: 34100 protein isoforms, 42011 protein groups and 7291033 precursors in 1 elution groups. [0:32] Initialising library WARNING: at least one library precursor had non-positive charge; corrected to charge = 2 WARNING: at least one library precursor had non-positive charge; corrected to charge = 2 WARNING: at least one library precursor had non-positive charge; corrected to charge = 2 WARNING: at least one library precursor had non-positive charge; corrected to charge = 2 WARNING: at least one library precursor had non-positive charge; corrected to charge = 2 WARNING: at least one library precursor had non-positive charge; corrected to charge = 2 WARNING: at least one library precursor had non-positive charge; corrected to charge = 2 WARNING: at least one library precursor had non-positive charge; corrected to charge = 2

[0:34] File #1/1 [0:34] Loading run 1F25_Saddic_MouseAorta_16A_Marfan.mzML [1:01] 0 library precursors are potentially detectable [1:01] Processing... [1:01] Removing interfering precursors [1:01] Too few confident identifications, neural network will not be used [1:01] Number of IDs at 0.01 FDR: 0 [1:01] Calculating protein q-values [1:01] Number of genes identified at 1% FDR: 0 (precursor-level), 0 (protein-level) (inference performed using proteotypic peptides only) [1:01] Quantification [1:01] Quantification information saved to TEMP/1F25_Saddic_MouseAorta_16A_Marfan_mzML.quant.

[1:01] Cross-run analysis [1:01] Reading quantification information: 1 files [1:01] Quantifying peptides [1:01] Assembling protein groups [1:03] Quantifying proteins [1:03] Calculating q-values for protein and gene groups [1:03] Writing report

vdemichev commented 3 years ago

Hi Vidya,

Please use DIA-NN 1.8, binaries can be downloaded here: https://github.com/vdemichev/DiaNN/releases/tag/1.8 The source code on github cannot be compiled correctly, it's from an older version of DIA-NN. This error means that the library was not loaded correctly - in this case this is because the library was likely generated by a later DIA-NN version.

Hope this helps!

Best, Vadim

vidyavenkatraman commented 3 years ago

I was able to install DIANN 1.8 using singularity since i am running on Centos 7.5. I don't get any error messages but the outputs are empty. See log below:

DIA-NN 1.8 (Data-Independent Acquisition by Neural Networks) Compiled on Jun 28 2021 10:59:57 Current date and time: Mon Jul 12 13:24:38 2021 Logical CPU cores: 8 Thread number set to 8 Output will be filtered at 0.01 FDR Precursor/protein x samples expression level matrices will be saved along with the main report A spectral library will be generated Neural networks will be used for peak selection A spectral library will be created from the DIA runs and used to reanalyse them; .quant files will only be saved to disk during the first step When generating a spectral library, in silico predicted spectra will be retained if deemed more reliable than experimental ones DIA-NN will optimise the mass accuracy automatically using the first run in the experiment. This is useful primarily for quick initial analyses, when it is not yet known which mass accuracy setting works best for a particular acquisition scheme.

2 files will be processed [0:00] Loading spectral library Saddic_DIANN_Library.predicted.speclib [0:07] Library annotated with sequence database(s): X:\Sarah Parker\FASTA\UNIPROT_MusMusculus-proteome_UP000000589+AND+reviewed_yes_March2021_iRT_DECOY.fasta.txt [0:07] Gene names missing for some isoforms [0:07] Library contains 17048 proteins, and 16670 genes [0:10] Spectral library loaded: 34100 protein isoforms, 42011 protein groups and 7291033 precursors in 3744120 elution groups. [0:10] Initialising library

[0:24] First pass: generating a spectral library from DIA data [0:24] File #1/2 [0:24] Loading run /common/venkatramanv/Data/Test/DIANN/1F25_Saddic_MouseAorta_16A_Marfan.mzML [1:22] 0 library precursors are potentially detectable [1:22] Processing... [1:22] Removing low confidence identifications [1:22] Removing interfering precursors [1:22] Too few confident identifications, neural networks will not be used [1:22] Number of IDs at 0.01 FDR: 0 [1:22] Calculating protein q-values [1:22] Number of genes identified at 1% FDR: 0 (precursor-level), 0 (protein-level) (inference performed using proteotypic peptides only) [1:22] Quantification [1:22] Quantification information saved to TEMP/_common_venkatramanv_Data_Test_DIANN_1F25_Saddic_MouseAorta_16A_Marfan_mzML.quant.

[1:22] File #2/2 [1:22] Loading run /common/venkatramanv/Data/Test/DIANN/1F25_Saddic_MouseAorta_16R_Marfan.mzML [2:21] 0 library precursors are potentially detectable [2:21] Processing... [2:21] Removing low confidence identifications [2:21] Removing interfering precursors [2:21] Too few confident identifications, neural networks will not be used [2:21] Number of IDs at 0.01 FDR: 0 [2:21] Calculating protein q-values [2:21] Number of genes identified at 1% FDR: 0 (precursor-level), 0 (protein-level) (inference performed using proteotypic peptides only) [2:21] Quantification [2:21] Quantification information saved to TEMP/_common_venkatramanv_Data_Test_DIANN_1F25_Saddic_MouseAorta_16R_Marfan_mzML.quant.

[2:21] Cross-run analysis [2:21] Reading quantification information: 2 files [2:21] Quantifying peptides WARNING: not enough peptides for normalisation [2:21] Assembling protein groups [2:26] Quantifying proteins [2:26] Calculating q-values for protein and gene groups [2:28] Calculating global q-values for protein and gene groups [2:28] Writing report [2:28] Report saved to Saddic_LowInput-first-pass.tsv. [2:28] Saving precursor levels matrix [2:28] Precursor levels matrix (1% precursor and protein group FDR) saved to Saddic_LowInput-first-pass.pr_matrix.tsv. [2:28] Saving protein group levels matrix [2:28] Saving gene group levels matrix [2:28] Saving unique genes levels matrix [2:28] Stats report saved to Saddic_LowInput-first-pass.stats.tsv [2:28] Generating spectral library: [2:28] Reading quantification information: 2 files [2:28] Assembling protein groups [2:33] 0 precursors passing the FDR threshold are to be extracted [2:33] Saving spectral library to Saddic_Library_DIANN.tsv [2:33] 0 precursors saved [2:33] Loading the generated library and saving it in the .speclib format [2:33] Loading spectral library Saddic_Library_DIANN.tsv [2:33] Spectral library loaded: 0 protein isoforms, 0 protein groups and 0 precursors in 1 elution groups. [2:33] Library contains 0 proteins, and 0 genes [2:33] Saving the library to Saddic_Library_DIANN.tsv.speclib

[2:37] Second pass: using the newly created spectral library to reanalyse the data [2:37] File #1/2 [2:37] Loading run /common/venkatramanv/Data/Test/DIANN/1F25_Saddic_MouseAorta_16A_Marfan.mzML [2:59] 0 library precursors are potentially detectable [2:59] Processing... [2:59] Removing low confidence identifications [2:59] Removing interfering precursors [2:59] Too few confident identifications, neural networks will not be used [2:59] Number of IDs at 0.01 FDR: 0 [2:59] Calculating protein q-values [2:59] Number of genes identified at 1% FDR: 0 (precursor-level), 0 (protein-level) (inference performed using proteotypic peptides only) [2:59] Quantification

[2:59] File #2/2 [2:59] Loading run /common/venkatramanv/Data/Test/DIANN/1F25_Saddic_MouseAorta_16R_Marfan.mzML [3:22] 0 library precursors are potentially detectable [3:22] Processing... [3:22] Removing low confidence identifications [3:22] Removing interfering precursors [3:22] Too few confident identifications, neural networks will not be used [3:22] Number of IDs at 0.01 FDR: 0 [3:22] Calculating protein q-values [3:22] Number of genes identified at 1% FDR: 0 (precursor-level), 0 (protein-level) (inference performed using proteotypic peptides only) [3:22] Quantification

[3:22] Cross-run analysis [3:22] Reading quantification information: 2 files [3:22] Quantifying peptides WARNING: not enough peptides for normalisation [3:22] Quantifying proteins [3:23] Calculating q-values for protein and gene groups [3:24] Calculating global q-values for protein and gene groups [3:24] Writing report [3:24] Report saved to Saddic_LowInput.tsv. [3:24] Saving precursor levels matrix [3:24] Precursor levels matrix (1% precursor and protein group FDR) saved to Saddic_LowInput.pr_matrix.tsv. [3:24] Saving protein group levels matrix [3:24] Saving gene group levels matrix [3:24] Saving unique genes levels matrix [3:24] Stats report saved to Saddic_LowInput.stats.tsv [3:24] Log saved to Saddic_LowInput.log.txt Finished

Any idea on what might be the issue? I did convert the raw files to mzML using proteowizard as per the tutorial.

vdemichev commented 3 years ago

Hi Vidya,

For whatever reason, these .mzML files are not loaded correctly. Options to troubleshoot:

convert to .dia, not .mzML (it's a better idea anyway)
check if these .mzML/.raw files are processed properly by DIA-NN on Windows
use DIA-NN on Linux under Wine (that is run diann.exe or DIA-NN.exe under Wine) - depends on whether Wine 6.8 or later will work on this Linux version - and access .raw files directly
if the problem persists, please run DIA-NN on these files with --export-windows command, and share the .txt files created next to the raw files - these will contain the list of DIA isolation windows used

Best, Vadim

vidyavenkatraman commented 3 years ago

Redoing the proteowizard conversion with the recommended settings seems to have fixed the issue.

vdemichev / DiaNN

DIA-NN 1.7.12 on linux throws WARNING: at least one library precursor had non-positive charge; corrected to charge = 2 #136