Closed ewels closed 3 years ago
Before a more in-depth look at the code, here are a few things that need resolving from a first glance:
You’re a bit behind the main nf-core template. You created the TEMPLATE
branch with version 1.9
of nf-core/tools and we’re on version 1.10.2
(about to release 1.10.3
). The new minor patch release might come very soon so hopefully you’ll get an automated sync PR if that happens.
Either way, you need to update the TEMPLATE
branch with the latest version and then bring those updates across to the main pipeline code. The update should hopefully not be too tough, mostly additions. For example, you're missing a bunch of GitHub actions files (kmermaid vs template) which will be added.
Generally the tests are looking great - no failures ✅ , however there are a few warnings. These are mostly about updating a bunch of conda packages (no need to do all of these, but some look like they should be easy). The others should be fixed by the template update.
nf-core lint
overall result: Passed with warnings ❗
Posted for pipeline commit e1f736c
+| ✅ 109 tests passed |+
!| ❗ 21 tests had warnings |!
-| ❌ 0 tests failed |-
I can see a few files that can be / should be removed I think:
Makefile
environment_hulk.yml
testing
directorey
nf-core/test-datasets
repo instead (there's already a branch for kmermaid)bin/markdown_to_html.r
- this was replaced by the template python version and is no longer usedHello everyone! Thank you for the detailed comments. Responding to @MaxUlysse's comments below (which addresses @ewels's concerns and more!):
Switching to a single comment instead of commenting into files, not to spam everyone.
.gitignore
seems to be a bit overkill to me.
Not sure that this is a major problem? There are Python files in the bin
and when I edit with PyCharm, then excess files get created. It seems to be fine to specify all the possible files that may want to be ignored, but open to suggestions!
Dockerfile
I guess that the last lines are to test that tools are working. Is it really necessary?
Yes, it helps to make sure everything got installed properly in the Docker image. I've had issues with the Docker image even working in the first place without having those checks in the past.
Makefile
I don't think it's actually useful in the repo
This speeds up my local workflow to build the docker image locally and not have to look up the docker build
command every time. I can remove the docker push
rule if that would help.
README
:
- There are still some
TODO
comments left.
👍
- Usage should be made less specific
👍
- I think you can already set up the Zenodo since you have a tag
👍
- There is two
assets/email_template
file, not sure which one we should keephtml
ortxt
According to the latest nf-core/tools
template, both are needed? https://github.com/nf-core/tools/tree/master/nf_core/pipeline-template/%7B%7Bcookiecutter.name_noslash%7D%7D/assets
assets/rrna-db-defaults.txt
what is this file used for?
It's an input to SortMeRNA for removing ribosomal reads.
- Remove
bin/markdown_to_html.r
,conf/awsbatch.config
,docs/configuration/adding_your_own.md
,docs/installation.md
,docs/troubleshooting.md
👍
docker/sysctl.conf
what is this file used for?
It was needed to deal with some temporary directory issues in Docker, and I think to help with some error messages.
params
--csv_pairs
and--csv_singles
could both be replaced by a single--input
. since you have a header, you could check the number of column to asses if single or double.
I would like users to be able to provide both --csv_pairs
and --csv_singles
at once, rather than a single --input
. There are cases where one would want to run the pipeline, which does an end-to-end k-mer similarity comparison, and get a matrix of sample-sample similarities in the end. It's not like with nf-core/rnaseq
where the matrices can simply be concatenated, all the samples can be provided at once.
--fastas
and--protein_fastas
why plural?
They are not a reference genome or proteome like in other pipelines (the only user-provided reference data is --reference_proteome_fasta
). They are input options to the pipeline, so they are plural. --fastas
is for nucleotide sequence input, e.g. for a microbial genome or a fasta of human transcripts, and --protein
is for protein sequence, e.g. for protein-coding transcripts.
This pipeline takes as input several possible sequences, optionally remove ribosomal RNA, optionally translate to nucleotide -> protein, then subsamples k-mers and compute an all-by-all k-mer similarity. The diversity of input options is intentional, such as to provide the ability to input:
And all of the above at once! This pipeline is supposed to be as flexible as possible to compare the maximum number of samples' k-mer similarities. It's typically not a "one-and-done" pipeline - I often run it many times on different configurations of the same samples.
--removeRiboRNA
,--saveNonRiboRNAReads
,--rNA_database_manifest
are not snake_cased.
This was copied from nf-core/rnaseq
with abandon and lazily not changed. :)
--molecule
or--molecules
, I've seen both in the docs and configs so far.
👍
--save_fastas
should be--save_fasta_dir
👍
- Why the
environment_hulk.yml
file?
We have a local machine that needs a lot of love and care when installing kmermaid
so we made it a special environment file.
nextflow.config
not up to date withTEMPLATE
These PRs didn't do it?
- there should not be any testing data in the repo, but in the
nf-core/test-datasets
one, on thekmermaid
branch
👍
Apart from these tiny comments, I am quite happy with the pipeline, I think you're mainly missing updates from the TEMPLATE, and maybe some input params improvement.
🎉 Thanks! Glad to hear! 🎉
Thanks for your reply and comments. Just replying to ones that need to, as you were very clear on all the other comments.
.gitignore
seems to be a bit overkill to me.Not sure that this is a major problem? There are Python files in the
bin
and when I edit with PyCharm, then excess files get created. It seems to be fine to specify all the possible files that may want to be ignored, but open to suggestions!
Not a problem at all, I agree, it was more a remark.
Dockerfile
I guess that the last lines are to test that tools are working. Is it really necessary?Yes, it helps to make sure everything got installed properly in the Docker image. I've had issues with the Docker image even working in the first place without having those checks in the past.
OK, then I have no issue keeping that, I'm pretty sure that with DSL2 we're moving out of Dockerfiles anyway
Makefile
I don't think it's actually useful in the repoThis speeds up my local workflow to build the docker image locally and not have to look up the
docker build
command every time. I can remove thedocker push
rule if that would help.
You said it yourself: This speeds up my local workflow
, I don't think it belongs here.
- There is two assets/email_template file, not sure which one we should keep html or txt
According to the latest nf-core/tools template, both are needed? https://github.com/nf-core/tools/tree/master/nf_core/pipeline-template/%7B%7Bcookiecutter.name_noslash%7D%7D/assets
Totally right, my bad...
- Why the
environment_hulk.yml
file?We have a local machine that needs a lot of love and care when installing
kmermaid
so we made it a special environment file.
You said it yourself: We have a local machine
, I don't think it belongs here.
BTW, we do have a Hulk
as well at NGI ;-)
nextflow.config
not up to date withTEMPLATE
These PRs didn't do it?
* #93 * #110
At least these lines are not in your config file: https://github.com/nf-core/tools/blob/f14c7a5692c96896c2ed1b730bc355093dca2247/nf_core/pipeline-template/%7B%7Bcookiecutter.name_noslash%7D%7D/nextflow.config#L82-L87
(That was what make me thinking that it might not have been up to date to start with)
I will have a more in depth look to make sure
Apart from these tiny comments, I am quite happy with the pipeline, I think you're mainly missing updates from the TEMPLATE, and maybe some input params improvement.
:tada: Thanks! Glad to hear! :tada:
Looking forward for the big release!!!
Regarding the template merge, I guessed you used version 1.9 because of this line: https://github.com/nf-core/kmermaid/blob/7907ca5dc827fd69895a6aeea8d17c39f72184d7/Dockerfile#L1
But if you don't think that you did, then it probably warrants a closer comparison between the template files and the pipeline files...
+1 for removing institute specific stuff. I know it's a hassle, we had to do similar things for nearly all of our pipelines when we ported them to be nf-core instead of SciLifeLab (it wasn't immediate, an uppmax
config profile shipped with all pipelines for a long time). But this is really the heart of nf-core: that pipelines are generalised as much as is possible so as to be first-class pipelines for anyone anywhere.
Stupid "Close with comment" button is too close to the "Comment" button, sorry..
Community review for initial release. Aim is to get at least two PR approvals from people in the nf-core community who are not the main developers of this pipeline ✅
https://nf-co.re/developers/adding_pipelines#making-the-first-release