Linux CLI build not parsing fasta file correctly

gblandsanofi commented 11 months ago

Hi Vadim,

Thank you for this tool. It really helps with our work. We are trying to generate a predicted library from a fasta file in DIANN v1.8.1. This works great in the Windows GUI build but does not work for the Linux CLI build. There was a similar issue that you solved earlier (issue 460). Here are the logs for both Windows and Linux builds:

Windows:

Thread number set to 63 Output will be filtered at 0.01 FDR Precursor/protein x samples expression level matrices will be saved along with the main report A spectral library will be generated Deep learning will be used to generate a new in silico spectral library from peptides provided Library-free search enabled Min fragment m/z set to 200 Max fragment m/z set to 1800 In silico digest will involve cuts at K,R Maximum number of missed cleavages set to 2 Min peptide length set to 14 Max peptide length set to 16 Min precursor m/z set to 350 Max precursor m/z set to 1010 Min precursor charge set to 2 Max precursor charge set to 4 Neural networks will be used for peak selection Protein inference will not be performed A spectral library will be created from the DIA runs and used to reanalyse them; .quant files will only be saved to disk during the first step The spectral library (if generated) will retain the original spectra but will include empirically-aligned RTs Fixed-width center of each elution peak will be used for quantification Interference removal from fragment elution curves disabled Mass accuracy will be fixed to 1.5e-05 (MS2) and 1.5e-05 (MS1) Exclusion of fragments shared between heavy and light peptides from quantification is not supported in FASTA digest mode - disabled; to enable, generate an in silico predicted spectral library and analyse with this library

27 files will be processed [0:00] Loading FASTA fasta.fasta [0:02] Processing FASTA [0:03] Assembling elution groups [0:05] 696225 precursors generated [0:05] Protein names missing for some isoforms [0:05] Gene names missing for some isoforms [0:05] Library contains 0 proteins, and 0 genes [0:05] [0:06] [2:18] [2:31] [2:31] [2:32] Saving the library to G:\DIANN\ADAM9\2023_10_05_SL11B_SL11C\2023_10_05_SL11B_SL11C_ADAM9_lib.predicted.speclib [2:33] Initialising library

[2:33] First pass: generating a spectral library from DIA data

Linux DIA-NN 1.8.1 (Data-Independent Acquisition by Neural Networks) Compiled on Apr 15 2022 08:45:18 Current date and time: Wed Oct 11 22:04:19 2023 Logical CPU cores: 64 Thread number set to 63 Output will be filtered at 0.01 FDR Precursor/protein x samples expression level matrices will be saved along with the main report A spectral library will be generated Deep learning will be used to generate a new in silico spectral library from peptides provided Library-free search enabled Min fragment m/z set to 200 Max fragment m/z set to 1800 In silico digest will involve cuts at K,R Maximum number of missed cleavages set to 2 Min peptide length set to 14 Max peptide length set to 16 Min precursor m/z set to 350 Max precursor m/z set to 1010 Min precursor charge set to 2 Max precursor charge set to 4 Neural networks will be used for peak selection Protein inference will not be performed A spectral library will be created from the DIA runs and used to reanalyse them; .quant files will only be saved to disk during the first step The spectral library (if generated) will retain the original spectra but will include empirically-aligned RTs Fixed-width center of each elution peak will be used for quantification Interference removal from fragment elution curves disabled Mass accuracy will be fixed to 1.5e-05 (MS2) and 1.5e-05 (MS1) Exclusion of fragments shared between heavy and light peptides from quantification is not supported in FASTA digest mode - disabled; to enable, generate an in silico predicted spectral library and analyse with this library

27 files will be processed [0:00] Loading FASTA fasta.fasta [0:01] Processing FASTA [0:01] Assembling elution groups [0:01] 3 precursors generated [0:01] Protein names missing for some isoforms [0:01] Gene names missing for some isoforms [0:01] Library contains 0 proteins, and 0 genes [0:05] Encoding peptides for spectra and RTs prediction [0:05] Predicting spectra and IMs [0:05] Predicting RTs [0:06] Decoding predicted spectra and IMs [0:06] Decoding RTs [0:06] Saving the library to test_lib.predicted.speclib [0:06] Initialising library

I also looked at the predicted spectral lib file from the Linux run, and it was only capturing the last peptide in the fasta file. I have also attached the fasta file for your reference. Let me know if you need anything else. fasta_file.zip

vdemichev commented 11 months ago

Would it be possible to replace Windows line endings in the FASTA with Linux line endings? I guess this is likely to help?

gblandsanofi commented 11 months ago

Hi, It seems that is the case. I had to open and resave the fasta file in linux and that seems to work. Thank you! I am closing this issue now.

vdemichev / DiaNN

Linux CLI build not parsing fasta file correctly #826