Difference between Windows GUI and Linux CLI builds in library-free mode

coldfire79 commented 2 years ago

Hi, I am testing the library-free search using DiaNN. I ran DiaNN in Windows and Linux with the same input file, fasta file and parameter setting. I found that there were some differences in terms of the number of precursors and proteins in the in-silico library.

Windows: DIA-NN.1_8.Setup.exe
- 10,769,188 precursors generated
- Library contains 20339 proteins, and 20136 genes
Linux (Ubuntu 18.04): diann_1.8.deb
- 8,748,279 precursors generated
- Library contains 20321 proteins, and 30620 genes

uniprot_human.fasta.gz: fasta file that I am using.

I am wondering where this difference is coming from.

vdemichev commented 2 years ago

How do the logs look like?

coldfire79 commented 2 years ago

In Windows:

DIA-NN 1.8 (Data-Independent Acquisition by Neural Networks)
Compiled on Jun 28 2021 14:55:31
Current date and time: Tue Aug  2 17:39:26 2022
CPU: GenuineIntel Intel(R) Xeon(R) Platinum 8124M CPU @ 3.00GHz
SIMD instructions: AVX AVX2 AVX512CD AVX512F FMA SSE4.1 SSE4.2 
Logical CPU cores: 36
Thread number set to 35
Output will be filtered at 0.01 FDR
Precursor/protein x samples expression level matrices will be saved along with the main report
Deep learning will be used to generate a new in silico spectral library from peptides provided
Library-free search enabled
Min fragment m/z set to 200
Max fragment m/z set to 1800
N-terminal methionine excision enabled
In silico digest will involve cuts at K*,R*
Maximum number of missed cleavages set to 2
Min peptide length set to 7
Max peptide length set to 30
Min precursor m/z set to 300
Max precursor m/z set to 1800
Min precursor charge set to 1
Max precursor charge set to 4
Cysteine carbamidomethylation enabled as a fixed modification
Maximum number of variable modifications set to 3
Modification UniMod:35 with mass delta 15.9949 at M will be considered as variable
Modification UniMod:1 with mass delta 42.0106 at *n will be considered as variable
A spectral library will be created from the DIA runs and used to reanalyse them; .quant files will only be saved to disk during the first step
When generating a spectral library, in silico predicted spectra will be retained if deemed more reliable than experimental ones
Mass accuracy will be fixed to 2e-05 (MS2) and 2e-05 (MS1)
The following variable modifications will be scored: UniMod:1 

3 files will be processed
[0:00] Loading FASTA C:\Users\Administrator\Downloads\DIANN-Test\uniprot_human.fasta
[0:18] Processing FASTA
[0:44] Assembling elution groups
[1:08] 10769188 precursors generated
[1:08] Gene names missing for some isoforms
[1:08] Library contains 20339 proteins, and 20136 genes
[1:09] Encoding peptides for spectra and RTs prediction
[1:34] Predicting spectra and IMs
[81:14] Predicting RTs
[94:08] Decoding predicted spectra and IMs
[94:21] Decoding RTs
[94:35] Saving the library to lib.predicted.speclib
[95:02] Initialising library

In Ubuntu

DIA-NN 1.8 (Data-Independent Acquisition by Neural Networks)
Compiled on Jun 28 2021 10:59:57
Current date and time: Wed Aug  3 21:52:47 2022
Logical CPU cores: 36
Thread number set to 34
Output will be filtered at 0.01 FDR
Precursor/protein x samples expression level matrices will be saved along with the main report
Deep learning will be used to generate a new in silico spectral library from peptides provided
Library-free search enabled
Min fragment m/z set to 200
Max fragment m/z set to 1800
N-terminal methionine excision enabled
In silico digest will involve cuts at K*,R*
Maximum number of missed cleavages set to 2
Min peptide length set to 7
Max peptide length set to 30
Min precursor m/z set to 300
Max precursor m/z set to 1800
Min precursor charge set to 1
Max precursor charge set to 4
Cysteine carbamidomethylation enabled as a fixed modification
Maximum number of variable modifications set to 3
Modification UniMod:35 with mass delta 15.9949 at M will be considered as variable
Modification UniMod:1 with mass delta 42.0106 at *n will be considered as variable
A spectral library will be created from the DIA runs and used to reanalyse them; .quant files will only be saved to disk during the first step
When generating a spectral library, in silico predicted spectra will be retained if deemed more reliable than experimental ones
Mass accuracy will be fixed to 2e-05 (MS2) and 2e-05 (MS1)
The following variable modifications will be scored: UniMod:1 

3 files will be processed
[0:00] Loading FASTA ./uniprot_human.fasta
[0:11] Processing FASTA
[0:27] Assembling elution groups
[0:42] 8748279 precursors generated
[0:42] Gene names missing for some isoforms
[0:42] Library contains 20321 proteins, and 30620 genes
[0:43] Encoding peptides for spectra and RTs prediction
[0:58] Predicting spectra and IMs
[12:36] Predicting RTs
[13:39] Decoding predicted spectra and IMs
[13:56] Decoding RTs
[14:01] Saving the library to diann-test/report-lib.predicted.speclib
[14:15] Initialising library

vdemichev commented 2 years ago

Some problem with parsing the FASTA file under Linux. I can reproduce this, but only with the FASTA file you provide - other UniProt FASTAs are fine. Can you please try to download it from UniProt again? Also, please switch to DIA-NN 1.8.1.

coldfire79 commented 2 years ago

Thank you for your answers!

vdemichev / DiaNN

Difference between Windows GUI and Linux CLI builds in library-free mode #460