uclahs-cds / pipeline-call-NonCanonicalPeptide

Nextflow pipeline to call non-canonical peptides as custom databases for proteogenomic analysis
https://automatic-adventure-o4l96o9.pages.github.io/
GNU General Public License v2.0
0 stars 1 forks source link

resources config files added for F16, F32, F72, and M64 #74

Closed zhuchcn closed 2 years ago

zhuchcn commented 2 years ago

@lydiayliu Is 10 GB enough for parsers?

Closes #73

lydiayliu commented 2 years ago

Is 10 GB enough for parsers?

I don't think so. Everything that needs to load the index should get at least 15G

I gathered the logs from CCLE:

It's on the conservative side but everything was like ~11.1G. I agree with giving callVariant 30G though for the worse case scenario. I don't know if I want to give callVariant more than that cuz it signals an issue.

lydiayliu commented 2 years ago

Why don't we take the opportunity to add the values for the rest of pipeline steps?

split_fasta: /hot/project/algorithm/moPepGen/CCLE/processed/noncanonical-database/call-nonCanonicalPeptide/GRCh38-EBI-GENCODE34/2022-05-30_merge_split/pipeline-meta-call-NonCanonicalPeptide-0.0.1/split_fasta.trace.txt

merge_fasta: /hot/project/algorithm/moPepGen/CCLE/processed/noncanonical-database/call-nonCanonicalPeptide/GRCh38-EBI-GENCODE34/2022-05-30_merge_split/pipeline-meta-call-NonCanonicalPeptide-0.0.1/merge_fasta.trace.txt

filter_fasta: /hot/project/algorithm/moPepGen/CCLE/processed/noncanonical-database/call-nonCanonicalPeptide/GRCh38-EBI-GENCODE34/2022-05-30_filter_split_0001_DONT_USE/pipeline-meta-call-NonCanonicalPeptide-0.0.1/filter_fasta.trace.txt

decoy_fasta: /hot/project/algorithm/moPepGen/CCLE/processed/noncanonical-database/call-nonCanonicalPeptide/GRCh38-EBI-GENCODE34/2022-05-30_merge_split/pipeline-meta-call-NonCanonicalPeptide-0.0.1/decoy_fasta.trace.txt

encode_fasta: /hot/project/algorithm/moPepGen/CCLE/processed/noncanonical-database/call-nonCanonicalPeptide/GRCh38-EBI-GENCODE34/2022-05-30_merge_split/pipeline-meta-call-NonCanonicalPeptide-0.0.1/encode_fasta.trace.txt

zhuchcn commented 2 years ago

That's great! Really appreciate gathering all the information! A little surprising to see how big the memory usage is for the processes that I thought are low memory. For the recommended memory values you gave, are they the worst case? A little confusing here because in this file (/hot/project/algorithm/moPepGen/CCLE/processed/noncanonical-database/call-nonCanonicalPeptide/GRCh38-EBI-GENCODE34/2022-05-30_merge_split/pipeline-meta-call-NonCanonicalPeptide-0.0.1/split_fasta.trace.txt), the worst case I can see seems to be 9.6 GB for rss and 10.7 GB for vmem. Did you just add some extra number to make it safe?

lydiayliu commented 2 years ago

Feel free to adjust! I just added some extra numbers and rounded to an even number lol, no real reason. But CCLE is definitely a conservative estimate considering there are very few mutations. I just don't have the numbers on CPCG

zhuchcn commented 2 years ago

Just modified the config files according to your recommendation.