vdemichev / DiaNN

DIA-NN - a universal automated software suite for DIA proteomics data analysis.
Other
263 stars 53 forks source link

Loading Alphapeptdeep library takes very long time #671

Open Maithy15 opened 1 year ago

Maithy15 commented 1 year ago

Dear Vadim,

I am trying out a insillico library generated by alphapeptdeep. The library is about 14 GB size. I generated it with carbamidometh and dimeth light as fixed mod and protein n-terminal acetyl as variable mods. It takes a very long time for DIA-NN to load this library.

2 files will be processed [0:00] Loading spectral library E:\Maithy\Maithy_plex\spec_lib_peptdeep_prediction\crr_sett_without_decoy\predict.speclib.tsv [2446:15] Finding proteotypic peptides (assuming that the list of UniProt ids provided for each peptide is complete) [2446:29] Spectral library loaded: 20808 protein isoforms, 31217 protein groups and 7149200 precursors in 2398374 elution groups. [2446:29] Loading FASTA E:\Maithy\Maithy_plex\spec_lib_peptdeep_prediction\20221214_UP000005640_9606_human.fasta [2458:53] Reannotating library precursors with information from the FASTA database [2459:04] Finding proteotypic peptides (assuming that the list of UniProt ids provided for each peptide is complete) [2459:04] 7149200 precursors generated [2459:04] Gene names missing for some isoforms [2459:04] Library contains 20562 proteins, and 20335 genes [2459:04] Splitting library entries across channels [2462:43] 87620 unlabelled precursors detected [2464:40] Assembling elution groups [2466:11] Initialising library

Do you think I should just reduce the siye of the library by not using variable mods or is there any way to make DIA-NN to read this library faster?

Thanks Maithy

vdemichev commented 1 year ago

Hi Maithy,

The time is probably loading the .tsv from disk. Removing unnecessary columns from tsv, if any, should help.

Best, Vadim