Closed zhuchcn closed 2 years ago
Is 10 GB enough for parsers?
I don't think so. Everything that needs to load the index should get at least 15G
I gathered the logs from CCLE:
/hot/project/algorithm/moPepGen/CCLE/processed/noncanonical-database/call-nonCanonicalPeptide/GRCh38-EBI-GENCODE34/2022-05-30/pipeline-meta-call-NonCanonicalPeptide-0.0.1/call_parsers.trace.txt
/hot/project/algorithm/moPepGen/CCLE/processed/noncanonical-database/call-nonCanonicalPeptide/GRCh38-EBI-GENCODE34/2022-05-30/pipeline-meta-call-NonCanonicalPeptide-0.0.1/call_variant.trace.txt
It's on the conservative side but everything was like ~11.1G. I agree with giving callVariant 30G though for the worse case scenario. I don't know if I want to give callVariant more than that cuz it signals an issue.
Why don't we take the opportunity to add the values for the rest of pipeline steps?
split_fasta: /hot/project/algorithm/moPepGen/CCLE/processed/noncanonical-database/call-nonCanonicalPeptide/GRCh38-EBI-GENCODE34/2022-05-30_merge_split/pipeline-meta-call-NonCanonicalPeptide-0.0.1/split_fasta.trace.txt
merge_fasta: /hot/project/algorithm/moPepGen/CCLE/processed/noncanonical-database/call-nonCanonicalPeptide/GRCh38-EBI-GENCODE34/2022-05-30_merge_split/pipeline-meta-call-NonCanonicalPeptide-0.0.1/merge_fasta.trace.txt
filter_fasta: /hot/project/algorithm/moPepGen/CCLE/processed/noncanonical-database/call-nonCanonicalPeptide/GRCh38-EBI-GENCODE34/2022-05-30_filter_split_0001_DONT_USE/pipeline-meta-call-NonCanonicalPeptide-0.0.1/filter_fasta.trace.txt
decoy_fasta: /hot/project/algorithm/moPepGen/CCLE/processed/noncanonical-database/call-nonCanonicalPeptide/GRCh38-EBI-GENCODE34/2022-05-30_merge_split/pipeline-meta-call-NonCanonicalPeptide-0.0.1/decoy_fasta.trace.txt
encode_fasta: /hot/project/algorithm/moPepGen/CCLE/processed/noncanonical-database/call-nonCanonicalPeptide/GRCh38-EBI-GENCODE34/2022-05-30_merge_split/pipeline-meta-call-NonCanonicalPeptide-0.0.1/encode_fasta.trace.txt
That's great! Really appreciate gathering all the information! A little surprising to see how big the memory usage is for the processes that I thought are low memory. For the recommended memory values you gave, are they the worst case? A little confusing here because in this file (/hot/project/algorithm/moPepGen/CCLE/processed/noncanonical-database/call-nonCanonicalPeptide/GRCh38-EBI-GENCODE34/2022-05-30_merge_split/pipeline-meta-call-NonCanonicalPeptide-0.0.1/split_fasta.trace.txt), the worst case I can see seems to be 9.6 GB for rss and 10.7 GB for vmem. Did you just add some extra number to make it safe?
Feel free to adjust! I just added some extra numbers and rounded to an even number lol, no real reason. But CCLE is definitely a conservative estimate considering there are very few mutations. I just don't have the numbers on CPCG
Just modified the config files according to your recommendation.
@lydiayliu Is 10 GB enough for parsers?
[X] I have read the code review guidelines and the code review best practice on GitHub check-list.
[X] The name of the branch is meaningful and well formatted following the standards, using [AD_username (or 5 letters of AD if AD is too long)-[brief_description_of_branch].
[X] I have set up or verified the branch protection rule following the github standards before opening this pull request.
[X] I have added my name to the contributors listings in the
metadata.yaml
and themanifest
block in thenextflow.config
as part of this pull request, am listed already, or do not wish to be listed. (This acknowledgement is optional.)[ ] I have added the changes included in this pull request to the
CHANGELOG.md
under the next release version or unreleased, and updated the date.[ ] I have updated the version number in the
metadata.yaml
andmanifest
block of thenextflow.config
file following semver, or the version number has already been updated. (Leave it unchecked if you are unsure about new version number and discuss it with the infrastructure team in this PR.)[X] All test cases have passed.
Closes #73