vdemichev / DiaNN

DIA-NN - a universal automated software suite for DIA proteomics data analysis.
Other
283 stars 53 forks source link

Phospho Predicted Library with Invalid/Unspecified Fragment Charge States (v1.9.2) #1220

Closed singjc closed 4 weeks ago

singjc commented 1 month ago

Hi Vadim,

I am trying to process some diaPASEF phospho data in library-free mode (v1.9.2), but I run into an error that the predicted library contains fragments with invalid/unspecified charges? I am wondering why the predicted library would generate invalid/unspecified charge states and how I can fix this? Is it possible to convert the spec lib to tsv so I can manually remove some of these invalid entries? I tried loading the speclib into skyline using the transition list import, but it failed to load.

Best,

Justin

Command

diann.exe --f "G:\raw\20240909_tims_Evo_CDS_TechDev_C1-1_6436.d
" --f "G:\raw\20240909_tims_Evo_CDS_TechDev_C1-2_6437.d
" --f "G:\raw\20240909_tims_Evo_CDS_TechDev_C5_6435.d
" --f "G:\raw\20240909_tims_Evo_CDS_TechDev_T1-1_6441.d
" --f "G:\raw\20240909_tims_Evo_CDS_TechDev_T1-2_6442.d
" --f "G:\raw\20240909_tims_Evo_CDS_TechDev_T5_6440.d
" --lib "" --threads 16 --verbose 1 --out "C:\DIA-NN\1.9.2/report.tsv" --qvalue 1 --matrices  --out-lib "C:\DIA-NN\1.9.2\report-lib.parquet" --gen-spec-lib --predictor --xic --fasta "G:\20241021_diann_1.9.2_lib_free\uniprotkb_Human_AND_reviewed_true_AND_m_2024_10_05.fasta" --fasta-search --min-fr-mz 200 --max-fr-mz 1800 --min-pep-len 7 --max-pep-len 30 --min-pr-mz 300 --max-pr-mz 1800 --min-pr-charge 2 --max-pr-charge 3 --cut K*,R* --missed-cleavages 1 --unimod4 --var-mods 3 --var-mod UniMod:21,79.966331,STY --mass-acc 15.0 --mass-acc-ms1 15.0 --individual-mass-acc --individual-windows --no-prot-inf --peptidoforms --reanalyse --relaxed-prot-inf --rt-profiling --no-norm

log (verbose 1)

DIA-NN 1.9.2 (Data-Independent Acquisition by Neural Networks)
Compiled on Oct 17 2024 21:58:43
Current date and time: Mon Oct 21 08:12:18 2024
CPU: GenuineIntel Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz
SIMD instructions: AVX AVX2 SSE4.1 SSE4.2 
Logical CPU cores: 16
Thread number set to 16
Output will be filtered at 1 FDR
Precursor/protein x samples expression level matrices will be saved along with the main report
A spectral library will be generated
Deep learning will be used to generate a new in silico spectral library from peptides provided
XICs within 10 seconds from the apex will be extracted for each precursor and saved in .parquet format, a folder will be created next to the main report for the XICs storage
DIA-NN will carry out FASTA digest for in silico lib generation
Min fragment m/z set to 200
Max fragment m/z set to 1800
Min peptide length set to 7
Max peptide length set to 30
Min precursor m/z set to 300
Max precursor m/z set to 1800
Min precursor charge set to 2
Max precursor charge set to 3
In silico digest will involve cuts at K*,R*
Maximum number of missed cleavages set to 1
Cysteine carbamidomethylation enabled as a fixed modification
Maximum number of variable modifications set to 3
Modification UniMod:21 with mass delta 79.9663 at STY will be considered as variable
Mass accuracy will be determined separately for different runs
Scan windows will be inferred separately for different runs
Protein inference will not be performed
Peptidoform scoring enabled
A spectral library will be created from the DIA runs and used to reanalyse them; .quant files will only be saved to disk during the first step
Heuristic protein grouping will be used, to reduce the number of protein groups obtained; this mode is recommended for benchmarking protein ID numbers, GO/pathway and system-scale analyses
The spectral library (if generated) will retain the original spectra but will include empirically-aligned RTs
Normalisation disabled
Mass accuracy will be fixed to 1.5e-05 (MS2) and 1.5e-05 (MS1)
WARNING: it is strongly recommended to first generate an in silico-predicted library in a separate pipeline step and then use it to process the raw data, now without activating FASTA digest
The following variable modifications will be scored: UniMod:21 
WARNING: it is strongly recommended to keep the q-value threshold at 5% or below when generating a spectral library from DIA data.

6 files will be processed
[0:00] Loading FASTA G:\20241021_diann_1.9.2_lib_free\uniprotkb_Human_AND_reviewed_true_AND_m_2024_10_05.fasta
[0:13] Processing FASTA
[1:42] Assembling elution groups
[2:27] 29190979 precursors generated
[2:27] Gene names missing for some isoforms
[2:27] Library contains 20404 proteins, and 20191 genes
[3:06] Encoding peptides for spectra and RTs prediction
[4:17] Predicting spectra and IMs
[142:43] Predicting RTs
[164:58] Decoding predicted spectra and IMs
[165:54] Decoding RTs
[165:58] Saving the library to C:\DIA-NN\1.9.2\report-lib.predicted.speclib
[167:50] Initialising library
[169:55] Loading spectral library C:\DIA-NN\1.9.2\report-lib.predicted.speclib
[170:30] Library annotated with sequence database(s): G:\20241021_diann_1.9.2_lib_free\uniprotkb_Human_AND_reviewed_true_AND_m_2024_10_05.fasta
[170:30] Assembling elution groups
ERROR: library contains fragments with invalid/unspecified charges

DIA-NN exited
DIA-NN-plotter.exe "C:\DIA-NN\1.9.2\report.stats.tsv" "C:\DIA-NN\1.9.2/report.tsv" "C:\DIA-NN\1.9.2\report.pdf"
PDF report will be generated in the background
vdemichev commented 1 month ago

Hi Justin,

Can you please share the FASTA file?

Best, Vadim

singjc commented 1 month ago

Sure, I attached it below. I retrieved the FASTA from uniprot for human swiss-prot proteins.

uniprotkb_Human_AND_reviewed_true_AND_m_2024_10_05.zip

vdemichev commented 1 month ago

Thank you! I cannot seem to reproduce it, based on what it prints in the log, it's like there was some disk-writing problem maybe. Please generate the library in a separate pipeline step anyway, does it work OK then?

singjc commented 1 month ago

Thank you! I cannot seem to reproduce it, based on what it prints in the log, it's like there was some disk-writing problem maybe. Please generate the library in a separate pipeline step anyway, does it work OK then?

Ahh thank you, I think you helped me figure out what the problem is. I am running DIAN-NN on a windows virtual machine on Ubuntu, and I think it tried to write out the library to disk on the virtual machine instead of the shared folder from the host machine (probably did not allocate enough virtual storage on the VM).

I just ran it again as a separate library generation pipeline and wrote to the shared host folder, and it generated the library successfully.

Thanks!