I worked on some TODOs within the download_proteins.py script. Mainly double checking what exactly is downloaded, how many proteins and how the largest assembly is chosen. I left comments wherever needed and tried to make the script easier readable. Also E-Utilities is only allowing requests of 9999 ids per request and I needed to split the download requests into chunks for larger experimental setups, which is the main change of this PR.
PR checklist
[x] This comment contains a description of changes (with reason).
[ ] If you've fixed a bug or added code that should be tested, add tests!
[ ] If you've added a new tool - have you followed the pipeline conventions in the contribution docs
[ ] If necessary, also make a PR on the nf-core/metapep branch on the nf-core/test-datasets repository.
[ ] Make sure your code lints (nf-core lint).
[ ] Ensure the test suite passes (nf-test test main.nf.test -profile test,docker).
[ ] Check for unexpected warnings in debug mode (nextflow run . -profile debug,test,docker --outdir <OUTDIR>).
[ ] Usage Documentation in docs/usage.md is updated.
[ ] Output Documentation in docs/output.md is updated.
[ ] CHANGELOG.md is updated.
[ ] README.md is updated (including new tool citations and authors/contributors).
I worked on some TODOs within the
download_proteins.py
script. Mainly double checking what exactly is downloaded, how many proteins and how the largest assembly is chosen. I left comments wherever needed and tried to make the script easier readable. Also E-Utilities is only allowing requests of 9999 ids per request and I needed to split the download requests into chunks for larger experimental setups, which is the main change of this PR.PR checklist
nf-core lint
).nf-test test main.nf.test -profile test,docker
).nextflow run . -profile debug,test,docker --outdir <OUTDIR>
).docs/usage.md
is updated.docs/output.md
is updated.CHANGELOG.md
is updated.README.md
is updated (including new tool citations and authors/contributors).