Closed nschcolnicov closed 1 month ago
A few things to discuss:
nf-core lint
overall result: Passed :white_check_mark: :warning:Posted for pipeline commit 8587df1
+| ✅ 190 tests passed |+
#| ❔ 3 tests were ignored |#
!| ❗ 6 tests had warnings |!
A few questions for sustainability (not trying to just tear down your hardwork):
environment.yml
and then we build the image with wave? Is samshee
on bioconda?A few questions for sustainability (not trying to just tear down your hardwork):
- Is a custom Docker image required here? Could it just be an
environment.yml
and then we build the image with wave? Issamshee
on bioconda?- Could this be a nf-core/module?
- Any possibility that nf-validation or nf-schema could be used instead? Is this just a fancy way to call a json schema against the CSV?
Hi @edmundmiller, no worries! Thank you for reviewing this changes.
@nf-core-bot fix linting
Also chiming in here @edmundmiller - Nicolas is working with us to get this in, as we saw there are several rules that need to be employed to verify that the Illumina samplesheets are accepted by Illumina bcl2fastq/bclconvert at least. Until now, we had several times issues where people had slight deviations from their standard (e.g. uM
or a (DNA/RNA)
) in their samplesheets - which took a while until the pipeline failed.
Samshee is (though new) looking great and can do the validation of the Illumina Samplesheet very nicely. Technically maybe also feasible via nf-validation but that might take a while and the rules would be more or less a duplication of what samshee already does. We will use nf-validation to validate the pipeline samplesheet though.
As far as I can tell, samshee is not available in bioconda, is this a requirement for building the image with wave? I'm not familiar with this approach. UPDATE: I replaced the docker image with wave.
Awesome, I'll add an environment.yml with what I'm talking about, and update the container and add a singularity image.
This could be an nf-core module, which is why I already added some of the additional files required, in case we decide to do so.
Sweet, just checking because others might find it useful.
What would be the benefit of having it inside the nf-validation or nf-schema? I didn't develop this tool, would it require a considerable effort to do this?
You can talk to @maxulysse more about this one 😆 Essentially spinning up a process just to validate is slow when this could just be done with the main Nextflow process. It might also be more maintainable instead of a custom script, it could be a few lines of groovy.
@apeltzer That all sounds good, just wanted to make sure the alternatives were thought of so when someone who's a stickler for the rules comes along, I can point to a follow-up issue.
Any possibility that nf-validation or nf-schema could be used instead? Is this just a fancy way to call a json schema against the CSV?
samshee also has some custom python code for additional validation, it does not just use a json schema. See e.g. https://github.com/lit-regensburg/samshee/blob/main/src/samshee/validation.py#L177
Also an illumina samplesheet contains multiple [Sections]. Samshee first parses the different sections and then applies different schemas to each of them.
All comments were addressed, @edmundmiller could you give it a final look and see if your requested changes have been addressed?
@nf-core-bot fix linting
Hey folks - please continue in this one here: https://github.com/nf-core/demultiplex/pull/238 as this is editable by anyone here on demux (wanted to fix linting and some smaller bits myself, I found your PR was from your fork so I couldn't edit anything...) :)
PR created to address this issue: https://github.com/nf-core/demultiplex/issues/222
PR checklist
nf-core lint
).nextflow run . -profile test,docker --outdir <OUTDIR>
).nextflow run . -profile debug,test,docker --outdir <OUTDIR>
).docs/usage.md
is updated.docs/output.md
is updated.CHANGELOG.md
is updated.README.md
is updated (including new tool citations and authors/contributors).