vdemichev / DiaNN

DIA-NN - a universal automated software suite for DIA proteomics data analysis.
Other
262 stars 53 forks source link

Linux CLI build not parsing fasta file correctly #826

Closed gblandsanofi closed 11 months ago

gblandsanofi commented 11 months ago

Hi Vadim,

Thank you for this tool. It really helps with our work. We are trying to generate a predicted library from a fasta file in DIANN v1.8.1. This works great in the Windows GUI build but does not work for the Linux CLI build. There was a similar issue that you solved earlier (issue 460). Here are the logs for both Windows and Linux builds:

Windows:

Thread number set to 63 Output will be filtered at 0.01 FDR Precursor/protein x samples expression level matrices will be saved along with the main report A spectral library will be generated Deep learning will be used to generate a new in silico spectral library from peptides provided Library-free search enabled Min fragment m/z set to 200 Max fragment m/z set to 1800 In silico digest will involve cuts at K,R Maximum number of missed cleavages set to 2 Min peptide length set to 14 Max peptide length set to 16 Min precursor m/z set to 350 Max precursor m/z set to 1010 Min precursor charge set to 2 Max precursor charge set to 4 Neural networks will be used for peak selection Protein inference will not be performed A spectral library will be created from the DIA runs and used to reanalyse them; .quant files will only be saved to disk during the first step The spectral library (if generated) will retain the original spectra but will include empirically-aligned RTs Fixed-width center of each elution peak will be used for quantification Interference removal from fragment elution curves disabled Mass accuracy will be fixed to 1.5e-05 (MS2) and 1.5e-05 (MS1) Exclusion of fragments shared between heavy and light peptides from quantification is not supported in FASTA digest mode - disabled; to enable, generate an in silico predicted spectral library and analyse with this library

27 files will be processed [0:00] Loading FASTA fasta.fasta [0:02] Processing FASTA [0:03] Assembling elution groups [0:05] 696225 precursors generated [0:05] Protein names missing for some isoforms [0:05] Gene names missing for some isoforms [0:05] Library contains 0 proteins, and 0 genes [0:05] [0:06] [2:18] [2:31] [2:31] [2:32] Saving the library to G:\DIANN\ADAM9\2023_10_05_SL11B_SL11C\2023_10_05_SL11B_SL11C_ADAM9_lib.predicted.speclib [2:33] Initialising library

[2:33] First pass: generating a spectral library from DIA data

Linux DIA-NN 1.8.1 (Data-Independent Acquisition by Neural Networks) Compiled on Apr 15 2022 08:45:18 Current date and time: Wed Oct 11 22:04:19 2023 Logical CPU cores: 64 Thread number set to 63 Output will be filtered at 0.01 FDR Precursor/protein x samples expression level matrices will be saved along with the main report A spectral library will be generated Deep learning will be used to generate a new in silico spectral library from peptides provided Library-free search enabled Min fragment m/z set to 200 Max fragment m/z set to 1800 In silico digest will involve cuts at K,R Maximum number of missed cleavages set to 2 Min peptide length set to 14 Max peptide length set to 16 Min precursor m/z set to 350 Max precursor m/z set to 1010 Min precursor charge set to 2 Max precursor charge set to 4 Neural networks will be used for peak selection Protein inference will not be performed A spectral library will be created from the DIA runs and used to reanalyse them; .quant files will only be saved to disk during the first step The spectral library (if generated) will retain the original spectra but will include empirically-aligned RTs Fixed-width center of each elution peak will be used for quantification Interference removal from fragment elution curves disabled Mass accuracy will be fixed to 1.5e-05 (MS2) and 1.5e-05 (MS1) Exclusion of fragments shared between heavy and light peptides from quantification is not supported in FASTA digest mode - disabled; to enable, generate an in silico predicted spectral library and analyse with this library

27 files will be processed [0:00] Loading FASTA fasta.fasta [0:01] Processing FASTA [0:01] Assembling elution groups [0:01] 3 precursors generated [0:01] Protein names missing for some isoforms [0:01] Gene names missing for some isoforms [0:01] Library contains 0 proteins, and 0 genes [0:05] Encoding peptides for spectra and RTs prediction [0:05] Predicting spectra and IMs [0:05] Predicting RTs [0:06] Decoding predicted spectra and IMs [0:06] Decoding RTs [0:06] Saving the library to test_lib.predicted.speclib [0:06] Initialising library

I also looked at the predicted spectral lib file from the Linux run, and it was only capturing the last peptide in the fasta file. I have also attached the fasta file for your reference. Let me know if you need anything else. fasta_file.zip

vdemichev commented 11 months ago

Would it be possible to replace Windows line endings in the FASTA with Linux line endings? I guess this is likely to help?

gblandsanofi commented 11 months ago

Hi, It seems that is the case. I had to open and resave the fasta file in linux and that seems to work. Thank you! I am closing this issue now.