Closed patrick-willems closed 1 month ago
Hi Patrick,
Can you please share the library in .tsv format (can be just a single peptide in there, does not need to be a full one) and the DIA-NN log?
Best, Vadim
Hey Vadim,
It seems to be a linux/docker issue rather, I was testing now on windows and there it did reannotate the peptides correctly. So from my side it is totally fine now and I will just use a Windows system, just to be complete here was the library and log on linux:
Lib TSV (one peptide - note that I still need to convert CCS to 1/K0):
ModifiedPeptide StrippedPeptide PrecursorCharge PrecursorMz IonMobility iRT ProteinId RelativeFragmentIntensity FragmentMz Fragme ntType FragmentNumber FragmentCharge FragmentLossType
VSTVSELVT VSTVSELVT 1 934.50970227 281.41616821 44.32462692 0.00023522 100.07564545 b 1 1 noloss
VSTVSELVT VSTVSELVT 1 934.50970227 281.41616821 44.32462692 0.00167886 187.10766602 b 2 1 noloss
VSTVSELVT VSTVSELVT 1 934.50970227 281.41616821 44.32462692 0.10095927 288.15533447 b 3 1 noloss
VSTVSELVT VSTVSELVT 1 934.50970227 281.41616821 44.32462692 0.14474481 387.22375488 b 4 1 noloss
VSTVSELVT VSTVSELVT 1 934.50970227 281.41616821 44.32462692 0.18281789 474.25576782 b 5 1 noloss
VSTVSELVT VSTVSELVT 1 934.50970227 281.41616821 44.32462692 0.77617210 603.29840088 b 6 1 noloss
VSTVSELVT VSTVSELVT 1 934.50970227 281.41616821 44.32462692 1.00000000 716.38244629 b 7 1 noloss
VSTVSELVT VSTVSELVT 1 934.50970227 281.41616821 44.32462692 0.33714399 815.45086670 b 8 1 noloss
VSTVSELVT VSTVSELVT 1 934.50970227 281.41616821 44.32462692 0.00000000 120.06547546 y 1 1 noloss
VSTVSELVT VSTVSELVT 1 934.50970227 281.41616821 44.32462692 0.00683188 219.13389587 y 2 1 noloss
VSTVSELVT VSTVSELVT 1 934.50970227 281.41616821 44.32462692 0.00981588 332.21792603 y 3 1 noloss
VSTVSELVT VSTVSELVT 1 934.50970227 281.41616821 44.32462692 0.02758598 461.26052856 y 4 1 noloss
VSTVSELVT VSTVSELVT 1 934.50970227 281.41616821 44.32462692 0.11619335 548.29260254 y 5 1 noloss
VSTVSELVT VSTVSELVT 1 934.50970227 281.41616821 44.32462692 0.03877397 647.36102295 y 6 1 noloss
VSTVSELVT VSTVSELVT 1 934.50970227 281.41616821 44.32462692 0.02824988 748.40869141 y 7 1 noloss
VSTVSELVT VSTVSELVT 1 934.50970227 281.41616821 44.32462692 0.01109457 835.44073486 y 8 1 noloss
My log is below:
DIA-NN 1.9.1 (Data-Independent Acquisition by Neural Networks)
Compiled on Jul 15 2024 09:42:01
Current date and time: Wed Oct 16 17:31:37 2024
Logical CPU cores: 32
Library precursors will be reannotated using the FASTA database
Thread number set to 32
Output will be filtered at 0.01 FDR
Precursor/protein x samples expression level matrices will be saved along with the main report
A spectral library will be generated
A spectral library will be created from the DIA runs and used to reanalyse them; .quant files will only be saved to disk during the first step
The spectral library (if generated) will retain the original spectra but will include empirically-aligned RTs
Mass accuracy will be fixed to 1.5e-05 (MS2) and 1.5e-05 (MS1)
WARNING: MBR turned off, two or more raw files are required
1 files will be processed
[0:00] Loading spectral library /data/9mers_valid_spectronaut.tsv
[2:42] Finding proteotypic peptides (assuming that the list of UniProt ids provided for each peptide is complete)
[3:03] Spectral library loaded: 0 protein isoforms, 0 protein groups and 9266887 precursors in 8731813 elution groups.
[3:03] Loading FASTA /data/UP000005640_9606_07082024.fasta
[3:26] Reannotating library precursors with information from the FASTA database
[3:29] Finding proteotypic peptides (assuming that the list of UniProt ids provided for each peptide is complete)
[3:29] 9266887 precursors generated
[3:29] Library contains 0 proteins, and 0 genes
[3:31] Initialising library
[3:52] Saving the library to /data/9mers_valid_spectronaut.tsv.skyline.speclib
[3:59] File #1/1
[3:59] Loading run /data/T063656_AurEl8_PM8_DIAIMP_CMB-1691_21_GD5_1_10545.d
[4:47] 9266887 library precursors are potentially detectable
[4:49] Processing...
[128:58] RT window set to 2.51954
[128:58] Ion mobility window set to 0.749911
[128:58] Peak width: 5.008
[128:58] Scan window radius set to 11
[129:00] Recommended MS1 mass accuracy setting: 12.0066 ppm
[272:27] Removing low confidence identifications
[272:28] Removing interfering precursors
[272:42] Training neural networks: 11961 targets, 7869 decoys
[272:50] Number of IDs at 0.01 FDR: 4762
[272:56] No protein annotation, skipping protein q-value calculation
[272:56] Quantification
[272:57] Quantification information saved to /data/T063656_AurEl8_PM8_DIAIMP_CMB-1691_21_GD5_1_10545.d.quant
[272:57] Cross-run analysis
[272:57] Reading quantification information: 1 files
[272:58] Quantifying peptides
[272:58] Quantifying proteins
[272:58] No protein annotation, skipping protein q-value calculation
[272:58] No protein annotation, skipping global protein q-value calculation
[272:58] Compressed report saved to /data/report.parquet. Use R 'arrow' or Python 'PyArrow' package to process
[272:58] Writing report
[272:59] Report saved to /data/report.tsv.
[272:59] Saving precursor levels matrix
[272:59] Precursor levels matrix (1% precursor and protein group FDR) saved to /data/report.pr_matrix.tsv.
[272:59] Saving protein group levels matrix
[272:59] Protein group levels matrix (1% precursor FDR and protein group FDR) saved to /data/report.pg_matrix.tsv.
[272:59] Saving gene group levels matrix
[272:59] Gene groups levels matrix (1% precursor FDR and protein group FDR) saved to /data/report.gg_matrix.tsv.
[272:59] Saving unique genes levels matrix
[272:59] Unique genes levels matrix (1% precursor FDR and protein group FDR) saved to /data/report.unique_genes_matrix.tsv.
[272:59] Manifest saved to /data/report.manifest.txt
terminate called after throwing an instance of 'std::length_error'
what(): basic_string::_M_create
Best Patrick
Hi Patrick,
Thanks for the info! Could you please attach the .tsv as a file? I just tried to debug but it's tricky to copy paste from github while retaining formatting.
Best, Vadim
Hey,
Yes, the first 1000 lines are here: lib_spectronaut.txt
The FASTA was the human reference UniProtKB.
Thanks!
Thank you!
Looks like non-tryptic lib, I get zero annotations also on Windows. With --cut F,Y,W,M,L,!P specificity I guess identical results (2 genes matched) on Windows and Linux.
Are you sure you get different results on Windows & Linux with identical settings? If yes, would you please be able to share the full lib & settings that cause it? Apologies for so many requests.
If you want --reannotate to just match to anything, please use --cut to enable cuts after arbitrary amino acids (for this listing all AAs in --cut, e.g. --cut A,G,L,I,...).
Best, Vadim
Aah yes, indeed, the cut flag had to be adapted, this was set default to K,R on my windows while I wrongly had it on linux. Thanks for pointing out it and sorry for the inconvenience!
Patrick
Hey Vadim,
First of all congrats with the 1.9 release - I already re-analyzed lots of older data given the improved performance with the new release.
I had a small question regarding the protein reannotate function within DIA-NN. I am using Spectronaut-formatted predicted libraries looking as:
I activated the --reannotate and specified the correct UniProtKB FASTA, hoping that DIA-NN would assign the proteins for me. It does print in the log that it is reannotating library precursors but then they are not assigned to proteins. Is it possible to add proteins for such library with DIA-NN? Otherwise, I will add the columns myself of course.
Thanks, Patrick