Closed jfy133 closed 9 months ago
nf-core lint
overall result: Passed :white_check_mark: :warning:Posted for pipeline commit a317cbe
+| ā
160 tests passed |+
#| ā 1 tests were ignored |#
!| ā 21 tests had warnings |!
I know this is very early work. I'm fine with merging as is but I had two concerns that I propose to change in the long run:
Do you mind if you give me an approval then I can merge this in and follow up depending on feedback of my status below? Then can start doing more parallel PRs to add the below
- There are two separate input options for
nodesdmp
andnamesdmp
. I would expect a path/tar of a directory with those files (possibly containing more of the dump files).
Is expectation this based purely on ncbi taxdump files? I currently had kept it separate because if people want to make custom databases that may include customised dump files whereby I don't see why one would necessarily re-tar...
That said it is reasonable to expect someone may want to do that... I may consider adding it as an option (if one gives the taxdump as tar it'll auto extract - but I vaguely remember plucking specific files from a directory is not directly trivial with Nxf).
But I would make this a separate issue as a separate functionality (including maybe auto downloading taxdump files, but I'm not sure yet how to do this properly e.g. with gtdbtk taxdump stuff etc)
- I would create one subworkflow per tool that the main createtaxdb workflow calls to.
Yes that's my plan for multi-module database construction commands (e.g. kraken2), or do you have a motivation to do this for single build modules too?
Do you mind if you give me an approval then I can merge this in and follow up depending on feedback of my status below?
I didn't approve due to the open comments. Do you plan to address them or see them as irrelevant?
Is expectation this based purely on ncbi taxdump files? I currently had kept it separate because if people want to make custom databases that may include customised dump files whereby I don't see why one would necessarily re-tar...
That said it is reasonable to expect someone may want to do that... I may consider adding it as an option (if one gives the taxdump as tar it'll auto extract - but I vaguely remember plucking specific files from a directory is not directly trivial with Nxf).
My expectation is based on taxonkit usage, yes. With a custom taxonomy I would just pass a path to a directory and then expect {dir}/nodes.dmp
and {dir}/names.dmp
to exist.
Yes that's my plan for multi-module database construction commands (e.g. kraken2), or do you have a motivation to do this for single build modules too?
No, I don't see a reason for single modules. I didn't read the file properly and somehow thought the input channel transformation was Kaiju-specific.
Do you mind if you give me an approval then I can merge this in and follow up depending on feedback of my status below?
I didn't approve due to the open comments. Do you plan to address them or see them as irrelevant?
Hm, I thought I had addressed them š¤, but I see now there is no commit. Maybe I didn't push...
Is expectation this based purely on ncbi taxdump files? I currently had kept it separate because if people want to make custom databases that may include customised dump files whereby I don't see why one would necessarily re-tar...
That said it is reasonable to expect someone may want to do that... I may consider adding it as an option (if one gives the taxdump as tar it'll auto extract - but I vaguely remember plucking specific files from a directory is not directly trivial with Nxf).
My expectation is based on taxonkit usage, yes. With a custom taxonomy I would just pass a path to a directory and then expect
{dir}/nodes.dmp
and{dir}/names.dmp
to exist.
Hrm, ok. I'll maybe look more in detail to taxonkit and try and use that as a structure to follow. But I think I would still do that as a follow up PR (because it should be quite straightforward to just change how those files are picked up and passed to the module).
Yes that's my plan for multi-module database construction commands (e.g. kraken2), or do you have a motivation to do this for single build modules too?
No, I don't see a reason for single modules. I didn't read the file properly and somehow thought the input channel transformation was Kaiju-specific.
Ah ok! Then I leave that bit as is for now at least.
OK thank you for the reviews @maxulysse @Joon-Klaps @Midnighter !
I've addressed all the suggestions now (except for the larger one from @Midnighter regarding taxdump, but I'll do that as a follow up), so I will merge so others can start to get involved, and other unaddressed changes can be dealt with in a follow up :)
This is more of a PoC to draft rough structure. Subject to change during development.
~Note input validation not working correctly, as only need to require one of either fasta_dna or fasta_aa, or both. But if neither supplied then it just runs with empty lists :(~ now working thanks to @mirpedrol :D
Adds:
database building and nf-tests :)
PR checklist
nf-core lint
).nextflow run . -profile test,docker --outdir <OUTDIR>
).docs/usage.md
is updated.docs/output.md
is updated.CHANGELOG.md
is updated.README.md
is updated (including new tool citations and authors/contributors).