vdemichev / DiaNN

DIA-NN - a universal automated software suite for DIA proteomics data analysis.
Other
283 stars 53 forks source link

Out of memermoy while dealing Cohort Data #1211

Open huangcx1539 opened 1 month ago

huangcx1539 commented 1 month ago

Hi Vdemichev,

I am use DIANN 1.8.1 to quantify 2000 MS files, which already create .quant files. The RAM only have 256GB. When second search using newly created spectral library to reanalyse the data, the software will quit when processing more than 300 files due to insufficient memory. Is there any parameter limiting the memory used by DIANN to ensure the operation of the task?

Best, huangcx

vdemichev commented 1 month ago

Hi huangcx,

How does the log look like?

In general, it's always possible to split any analysis in batches.

Best, Vadim

huangcx1539 commented 1 month ago

Hi Vadim,

The mission suddenly came to an end. The log of computer was show it was killed because out of memory.

[3972:01] File #319/2114 [3972:01] Loading run /data/TOF4_DIA_16PASEF_20220429_293T_200ng_300nLmint_120min_column0419_R2_GB2_1_443/TOF4_DIA_16PASEF_20220429_293T_200ng_300nLmint_120min_column0419_R2_GB2_1_443.d [3973:17] 277769 library precursors are potentially detectable [3973:17] Processing... [3973:26] RT window set to 2.25324 [3973:26] Ion mobility window set to 0.0264466 [3973:26] Recommended MS1 mass accuracy setting: 14.3672 ppm [3974:23] Removing low confidence identifications [3974:23] Searching PTM decoys [3974:24] Removing interfering precursors [3974:39] Training neural networks: 260719 targets, 270557 decoys [3974:52] Number of IDs at 0.01 FDR: 171127 [3975:00] Calculating protein q-values [3975:01] Number of genes identified at 1% FDR: 10354 (precursor-level), 9644 (protein-level) (inference performed using proteotypic peptides only) [3975:01] Quantification [3975:02] Precursors with monitored PTMs at 1% FDR: 1562 out of 1725 [3975:02] Unmodified precursors with monitored PTM sites at 1% FDR: 483 out of 513

[3975:05] File #320/2114 [3975:06] Loading run /data/TOF4_DIA_16PASEF_20220816_293T_200ng_300nLmint_120min_column0802_QC_GB4_1_1578/TOF4_DIA_16PASEF_20220816_293T_200ng_300nLmint_120min_column0802_QC_GB4_1_1578.d

Best, huangcx

vdemichev commented 1 month ago

If this is a problem with the specific run TOF4_DIA_16PASEF_20220816_293T_200ng_300nLmint_120min_column0802_QC_GB4_1_1578.d, it will occur regardless of how many runs were analysed before - so the solution will then be to exclude this run in this case. If this occurs on random runs, then it might be a RAM problem - makes sense to monitor the RAM usage to confirm. However, this looks strange, as with such a small library 256Gb should never be exhausted.

When second search using newly created spectral library to reanalyse the data

Try relaunching just using the newly generated DIA-based lib, with MBR off?

huangcx1539 commented 1 month ago

Hi Vadim,

I have test different parameter (MBR), the second to fourth searches were successfully completed, but the number of proteins identified was lower than frist search.

For TOF3_DIA_20230812_WY_293T_200ng_300nl_110min_75umID_QC_Slot1-25_1_7784.d In No.1 search [3033:22] Second pass: using the newly created spectral library to reanalyse the data, ... [3036:13] Number of genes identified at 1% FDR: 10287 (precursor-level), 9517 (protein-level) (inference performed using proteotypic peptides only).

In other search, the final number of protein-level was 8949 (same with the result of Report-first-pass.stats.tsv in No.1 Search)

No.1 (Initial parameter) --lib "" --threads 95 --verbose 1 --out /data/Result/Report.tsv --out-lib /data/Result/report-lib.tsv --gen-spec-lib --qvalue 0.01 --matrices --predictor --fasta /Fasta/211203-uniprot-human.fasta --fasta-search --min-fr-mz 200 --max-fr-mz 1800 --met-excision --cut K,R --missed-cleavages 2 --min-pep-len 7 --max-pep-len 30 --min-pr-mz 300 --max-pr-mz 1800 --min-pr-charge 1 --max-pr-charge 4 --unimod4 --var-mods 5 --var-mod UniMod:35,15.994915,M --var-mod UniMod:1,42.010565,*n --monitor-mod UniMod:1 --reanalyse --relaxed-prot-inf --smart-profiling --peak-center --no-ifs-removal --use-quant --no-norm

No.2 (remove '--reanalyse', the report-lib.tsv was generated by First search) --lib /data/Result/report-lib.tsv --threads 95 --verbose 1 --out /data/Result/Report.tsv --out-lib "" --gen-spec-lib --qvalue 0.01 --matrices --predictor --fasta /Fasta/211203-uniprot-human.fasta --fasta-search --min-fr-mz 200 --max-fr-mz 1800 --met-excision --cut K,R --missed-cleavages 2 --min-pep-len 7 --max-pep-len 30 --min-pr-mz 300 --max-pr-mz 1800 --min-pr-charge 1 --max-pr-charge 4 --unimod4 --var-mods 5 --var-mod UniMod:35,15.994915,M --var-mod UniMod:1,42.010565,*n --monitor-mod UniMod:1 --relaxed-prot-inf --smart-profiling --peak-center --no-ifs-removal --use-quant --no-norm

No.3 (remove '--reanalyse', the report-lib.tsv.speclib was generated by First search) --lib /data/Result/report-lib.tsv.speclib --threads 95 --verbose 1 --out /data/Result/Report.tsv --out-lib /data/Result/report-lib.tsv --gen-spec-lib --qvalue 0.01 --matrices --predictor --fasta /Fasta/211203-uniprot-human.fasta --fasta-search --min-fr-mz 200 --max-fr-mz 1800 --met-excision --cut K,R --missed-cleavages 2 --min-pep-len 7 --max-pep-len 30 --min-pr-mz 300 --max-pr-mz 1800 --min-pr-charge 1 --max-pr-charge 4 --unimod4 --var-mods 5 --var-mod UniMod:35,15.994915,M --var-mod UniMod:1,42.010565,*n --monitor-mod UniMod:1 --relaxed-prot-inf --smart-profiling --peak-center --no-ifs-removal --use-quant --no-norm

No.4 (the report-lib.tsv.speclib was generated by First search) --lib /data/Result/report-lib.tsv --threads 95 --verbose 1 --out /data/Result/Report.tsv --out-lib "" --gen-spec-lib --qvalue 0.01 --matrices --predictor --fasta /Fasta/211203-uniprot-human.fasta --fasta-search --min-fr-mz 200 --max-fr-mz 1800 --met-excision --cut K,R --missed-cleavages 2 --min-pep-len 7 --max-pep-len 30 --min-pr-mz 300 --max-pr-mz 1800 --min-pr-charge 1 --max-pr-charge 4 --unimod4 --var-mods 5 --var-mod UniMod:35,15.994915,M --var-mod UniMod:1,42.010565,*n --monitor-mod UniMod:1 --reanalyse --relaxed-prot-inf --smart-profiling --peak-center --no-ifs-removal --use-quant --no-norm

huangcx1539 commented 1 month ago

微信图片_20241023110444 This is a screenshot of the memory monitoring while the task is running

vdemichev commented 3 weeks ago

I have test different parameter (MBR), the second to fourth searches were successfully completed, but the number of proteins identified was lower than frist search.

If you think this is unexpected, I can take a look at the logs (preferably full logs).

In other search

What was different about the settings in this one?

--fasta-search

Must never be present in the same analysis as --f, i.e. FASTA digest must not be combined with raw data analysis, DIA-NN prints a warning about this.