vdemichev / DiaNN

DIA-NN - a universal automated software suite for DIA proteomics data analysis.
Other
271 stars 53 forks source link

Unknown Fragment type #673

Open TeckYewLow opened 1 year ago

TeckYewLow commented 1 year ago

Hi.

After analysis of my file, I got this message: "WARNING: unknown fragment type VHS2_c2riboseqorf5uORF-; for fragments containing the N-terminus please specify the fragment type as 'b', for fragments containing the C-terminus - as 'y'; it's essential to use a properly annotated library for reliable analysis"

Is that because my FASTA entry is too short and non-tryptic? ">tr|VHS2_c2riboseqorf5uORF-|KIDINS220_HUMAN ENST00000256707 NM_001348729.2 OS=Homo sapiens OX=9606 GN=KIDINS220 PE=0 SV=0 MAAGCGEGDALAVAVSCFPVL

vdemichev commented 1 year ago

How do the logs for all steps of the analysis look like?

TeckYewLow commented 1 year ago

Hi Vadim,

There is no log file generated. I copy this info from the real-time window of the run:

diann.exe --f "D:\DIATEST\Plasma\plasma_20min_1ug_inj1.wiff " --f "D:\DIATEST\Plasma\plasma_20min_1ug_inj2.wiff " --f "D:\DIATEST\Plasma\plasma_20min_1ug_inj3.wiff " --lib "F:\MSFragger_FASTA\HUMAN_DB\Human_SP_VHS_Oprot1pep2pep_Combined_20230424_REAL.predicted.speclib" --threads 4 --verbose 1 --out "D:\DIATEST\Plasma\report.tsv" --qvalue 0.01 --matrices --min-corr 2.0 --corr-diff 1.0 --time-corr-only --extracted-ms1 --out-lib "D:\DIATEST\Plasma\report-lib.tsv" --gen-spec-lib --prosit --var-mods 1 --use-quant --reanalyse --relaxed-prot-inf --smart-profiling --pg-level 1 --peak-center --no-ifs-removal DIA-NN 1.8.1 (Data-Independent Acquisition by Neural Networks) Compiled on Apr 14 2022 15:31:19 Current date and time: Wed Apr 26 17:46:52 2023 CPU: GenuineIntel Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz SIMD instructions: AVX AVX2 FMA SSE4.1 SSE4.2 Logical CPU cores: 8 Thread number set to 4 Output will be filtered at 0.01 FDR Precursor/protein x samples expression level matrices will be saved along with the main report Only peaks with correlation sum exceeding 2 will be considered Peaks with correlation sum below 1 from maximum will not be considered A single score will be used until RT alignment to save memory; this can potentially lead to slower search Fast algorithm based on MS1 feature extraction for quicker library-free search will be applied; this significantly worsens the identification performance A spectral library will be generated Maximum number of variable modifications set to 1 Existing .quant files will be used A spectral library will be created from the DIA runs and used to reanalyse them; .quant files will only be saved to disk during the first step Highly heuristic protein grouping will be used, to reduce the number of protein groups obtained; this mode is recommended for benchmarking protein ID numbers; use with caution for anything else When generating a spectral library, in silico predicted spectra will be retained if deemed more reliable than experimental ones Implicit protein grouping: protein names; this determines which peptides are considered 'proteotypic' and thus affects protein FDR calculation Fixed-width center of each elution peak will be used for quantification Interference removal from fragment elution curves disabled DIA-NN will optimise the mass accuracy automatically using the first run in the experiment. This is useful primarily for quick initial analyses, when it is not yet known which mass accuracy setting works best for a particular acquisition scheme.

3 files will be processed [0:00] Loading spectral library F:\MSFragger_FASTA\HUMAN_DB\Human_SP_VHS_Oprot1pep2pep_Combined_20230424_REAL.predicted.speclib [0:09] Library annotated with sequence database(s): F:\MSFragger_FASTA\HUMAN_DB\Human_SP_VHS_Oprot1pep2pep_Combined_20230424_REAL.fasta [0:09] Gene names missing for some isoforms [0:09] Library contains 51729 proteins, and 37159 genes [0:10] Spectral library loaded: 70841 protein isoforms, 87935 protein groups and 4204254 precursors in 1806898 elution groups. [0:10] Preparing Prosit input from the spectral library provided [0:16] Prosit input saved to D:\DIATEST\Plasma\report-lib.prosit.csv [0:16] Initialising library

[0:20] First pass: generating a spectral library from DIA data [0:20] Cross-run analysis [0:20] Reading quantification information: 3 files [0:20] Quantifying peptides [0:20] Assembling protein groups [0:23] Quantifying proteins [0:24] Calculating q-values for protein and gene groups [0:24] Calculating global q-values for protein and gene groups [0:24] Writing report [0:25] Report saved to D:\DIATEST\Plasma\report-first-pass.tsv. [0:25] Saving precursor levels matrix [0:25] Precursor levels matrix (1% precursor and protein group FDR) saved to D:\DIATEST\Plasma\report-first-pass.pr_matrix.tsv. [0:25] Saving protein group levels matrix [0:25] Protein group levels matrix (1% precursor FDR and protein group FDR) saved to D:\DIATEST\Plasma\report-first-pass.pg_matrix.tsv. [0:25] Saving gene group levels matrix [0:25] Gene groups levels matrix (1% precursor FDR and protein group FDR) saved to D:\DIATEST\Plasma\report-first-pass.gg_matrix.tsv. [0:25] Saving unique genes levels matrix [0:25] Unique genes levels matrix (1% precursor FDR and protein group FDR) saved to D:\DIATEST\Plasma\report-first-pass.unique_genes_matrix.tsv. [0:25] Stats report saved to D:\DIATEST\Plasma\report-first-pass.stats.tsv [0:25] Generating spectral library: [0:25] 3196 precursors passing the FDR threshold are to be extracted [0:25] Loading run D:\DIATEST\Plasma\plasma_20min_1ug_inj1.wiff [5:42] 3539374 library precursors are potentially detectable [5:42] 312 spectra added to the library [5:43] Loading run D:\DIATEST\Plasma\plasma_20min_1ug_inj2.wiff [10:55] 3539374 library precursors are potentially detectable [10:55] 2202 spectra added to the library [10:56] Loading run D:\DIATEST\Plasma\plasma_20min_1ug_inj3.wiff [16:09] 3539374 library precursors are potentially detectable [16:09] 199 spectra added to the library [16:09] Saving spectral library to D:\DIATEST\Plasma\report-lib.tsv [16:10] 3196 precursors saved [16:10] Loading the generated library and saving it in the .speclib format [16:10] Loading spectral library D:\DIATEST\Plasma\report-lib.tsv WARNING: unknown fragment type VHS2_c2riboseqorf5uORF-; for fragments containing the N-terminus please specify the fragment type as 'b', for fragments containing the C-terminus - as 'y'; it's essential to use a properly annotated library for reliable analysis

DIA-NN exited DIA-NN-plotter.exe "D:\DIATEST\Plasma\report.stats.tsv" "D:\DIATEST\Plasma\report.tsv" "D:\DIATEST\Plasma\report.pdf" PDF report will be generated in the background