Open liv-acollins opened 2 years ago
Means protein IDs in the library don't correspond to protein IDs in the FASTA. If the protein annotation in the main output report (not the matrices) looks fine, no need to do anything about it, I guess.
I thought might be the case.
From the spectral library
778.4129855 427.22995924 57.3 4_AAAAAAAAAAAAAAAASAGGK_2 -1 8319.7 1_AAAAAAAAAAAAAAAASAGGK_2 0 AAAAAAAAAAAAAAAASAGGK 1/P0CG40 b6/-0.006,b12^2/-0.006,m11:16/-0.006 AAAAAAAAAAAAAAAASAGGK 2 light 1/P0CG40 b 1 6
From the FASTA:
>sp|P0CG40|SP9_HUMAN Transcription factor Sp9 OS=Homo sapiens OX=9606 GN=SP9 PE=3 SV=1
In this case, I assume DIA-NN isn't extracting the identifier to match? I assume altering the FASTA or spectral library would fix this. Is that correct?
DIA-NN reads "1/P0CG40" from the library and cannot find this in FASTA, yes.
Basically, does not support this '1/', '2/', etc syntax.
Hi @vdemichev,
Hi my log shows the following:
[0:00] Loading spectral library phl004_canonical_sall_osw.csv WARNING: no neutral loss information found in the library - assuming fragments without losses [0:06] Finding proteotypic peptides (assuming that the list of UniProt ids provided for each peptide is complete) [0:06] Spectral library loaded: 14158 protein isoforms, 14158 protein groups and 211370 precursors in 159345 elution groups. [0:06] Loading protein annotations from FASTA UP000005640_9606.fasta [0:06] Annotating library proteins with information from the FASTA database [0:06] Protein names missing for some isoforms [0:06] Gene names missing for some isoforms [0:06] Library contains 0 proteins, and 0 genes [0:06] Initialising library [0:06] Saving the library to phl004_canonical_sall_osw.csv.speclib
Later followed by: [3:52] Number of IDs at 0.01 FDR: 3359 [3:52] Calculating protein q-values [3:52] Number of genes identified at 1% FDR: 0 (precursor-level), 0 (protein-level) (inference performed using proteotypic peptides only)
I assume this is because the protein name is not being read from my spectral library. I'm using the following library: https://db.systemsbiology.net/sbeams/cgi/PeptideAtlas/GetDIALibs Human Pan-Human library 12,046 proteins phl004_canonical_sall_osw.csv For the FASTA I'm using: I'm then using the latest Uniprot Human (one protein per gene) FASTA.
My command is: diann-1.8 \ --dir "$MZML_DIR" \ --lib "$SPEC_LIB" \ --threads $SLURM_CPUS_ON_NODE \ --verbose 1 \ --out "$OUTPUT_PATH" \ --qvalue 0.01 \ --matrices \ --fasta "$FASTA_PATH" \ --met-excision \ --cut K,R \ --reanalyse \ --smart-profiling
My final .pg_matrix does have protein results, but my gg_matrix is empty.
If I understand correctly, it looks like it is able to read the protein identifiers from the spectral library, but I suspect it is unable to read the FASTA file and match the identifiers. What am I missing?