Closed glichtenstein closed 5 months ago
nf-core lint
overall result: Failed :x:Posted for pipeline commit 6c63769
+| ✅ 157 tests passed |+
#| ❔ 2 tests were ignored |#
!| ❗ 4 tests had warnings |!
-| ❌ 10 tests failed |-
Question for Reviewers about setting up a test for the PR:
In our in-house testing using BCLConvert, we have a SampleSheet.csv
where one of the samples has wrong barcodes assigned (both Index
and Index2
). This leads to the generation of empty FASTQ files for that specific sample, which is one of the scenarios this PR aim to address.
However, for the nf-core GitHub tests, we are currently using this sample sheet for testing demultiplex: flowcell_input.csv. This sample sheet doesn't include any incorrect indices that would result in empty FASTQ files.
My question is: Would it be acceptable to modify this standard sample sheet so that it includes a sample (sample2
) with incorrect indices? This modification would enable us to test the pipeline's ability to handle empty FASTQ files in a controlled and predictable manner, which is crucial for validating the new features.
Perhaps, we could also modify one of these with a s25
and add some random barcodes that will yield empty s25 fastqs:
Your guidance on this matter would be greatly appreciated, as it will help ensure that our tests are comprehensive and reflective of real-world scenarios where such issues might arise.
Thanks for the PR!
Gonna wait for @matthdsm on this one.
In the meantime could you squash some of the commits? There's a lot of them that are linting or debugging, or if you'd like we can just squash the whole PR, there's not many changes made. 89 commits for 96 lines of code just seems like a lot.
@glichtenstein about the test data, yes I think it would make a lot of sense to have a samplesheet with faulty barcodes. Maybe it would be best to just create a new samplesheet so we have two test cases, one for a normal run and one with faulty barcodes.
I've been working on setting up a test dataset from a NovaSeq6000, would be nice if we could have second samplesheet there too (see https://github.com/nf-core/test-datasets/blob/demultiplex/testdata/NovaSeq6000)
Thanks for the PR!
Gonna wait for @matthdsm on this one.
In the meantime could you squash some of the commits? There's a lot of them that are linting or debugging, or if you'd like we can just squash the whole PR, there's not many changes made. 89 commits for 96 lines of code just seems like a lot.
Thanks @edmundmiller ! Very good idea, I was not aware that one could squash commits, I will proceed with an interactive rebase, thus squashing all the commits into one that is more descriptive. Thanks for the heads-up.
LGTM!
Thanks for the review @matthdsm !
@glichtenstein about the test data, yes I think it would make a lot of sense to have a samplesheet with faulty barcodes. Maybe it would be best to just create a new samplesheet so we have two test cases, one for a normal run and one with faulty barcodes.
I've been working on setting up a test dataset from a NovaSeq6000, would be nice if we could have second samplesheet there too (see https://github.com/nf-core/test-datasets/blob/demultiplex/testdata/NovaSeq6000)
@Aratz Thanks for the reviews! Yes, that is a very good idea, I will get this dataset, and do some local testing with faulty barcodes, then I could open a branch for the NovaSeq6000 SampleSheet.csv. Do you rather have two SampleSheets? like a SampleSheet_faulty.csv? or would you prefer that I add a new sample row for the current SampleSheet.csv?
I think it's great to be able to complete the pipeline even with faulty barcodes! However, sometimes we might want the pipeline to crash to avoid risking this issue going under the radar. Maybe we could have an extra parameter e.g.
skip_empty_fastq
? (could be true by default).
This is a very good suggestion, we thought of it but it was going to require more business logic, but definitely it will be a good addition. I will explore options and come back to you. In this regards, shall I merge this PR into dev and then open a new branch for it, or would you suggest I keep working on the '--skip_empty_fastq' in the same PR?
Also, if I'm not mistaken
subworkflows/nf-core/bcl_demultiplex/main.nf
is automatically imported from https://github.com/nf-core/modules, so your PR should be made against that repository first.
Oh, I was not aware of that relation, I will take a look, and make the PR againt https://github.com/nf-core/modules if possible.
Thanks a lot.
@Aratz I have opened a PR in modules 🤞
Thanks for the PR!
Gonna wait for @matthdsm on this one.
In the meantime could you squash some of the commits? There's a lot of them that are linting or debugging, or if you'd like we can just squash the whole PR, there's not many changes made. 89 commits for 96 lines of code just seems like a lot.
Hi @edmundmiller, after the rebase to squash all the commits, I do not pass the nf-core inting, do you think something has changed with the tests?
@Aratz Thanks for the reviews! Yes, that is a very good idea, I will get this dataset, and do some local testing with faulty barcodes, then I could open a branch for the NovaSeq6000 SampleSheet.csv. Do you rather have two SampleSheets? like a SampleSheet_faulty.csv? or would you prefer that I add a new sample row for the current SampleSheet.csv?
Yes I think it would be best with two samplesheets (there is one in the .tar.gz
file so you might need to upload a new one too. I thinking both datasets can sit in the Novaseq6000
folder, and you can document why they differ in the README file.
This is a very good suggestion, we thought of it but it was going to require more business logic, but definitely it will be a good addition. I will explore options and come back to you. In this regards, shall I merge this PR into dev and then open a new branch for it, or would you suggest I keep working on the '--skip_empty_fastq' in the same PR?
I'd say you can close this one and open a new one with the automatic module update and work on the new parameter there :+1:
Oh, I was not aware of that relation, I will take a look, and make the PR againt https://github.com/nf-core/modules if possible. Thanks a lot.
Great! I'll check it out!
Closing this PR, it was opened on nf-core/modules instead: https://github.com/nf-core/modules/pull/4842
PR checklist
nf-core lint
).nextflow run . -profile test,docker --outdir <OUTDIR>
).docs/usage.md
is updated.docs/output.md
is updated.CHANGELOG.md
is updated.README.md
is updated (including new tool citations and authors/contributors).Title of PR:
"Enhanced Resilience to Empty or Invalid FASTQ Files with Logging Functionality"
Description:
Changes Made:
Introduction of a Log File:
${params.outdir}/invalid_fastqs.log
) for tracking empty or invalid FASTQ files, enhancing post-run analysis.Modified
generate_fastq_meta
Function:generate_fastq_meta
to gracefully handle empty or invalid FASTQ files, preventing pipeline interruption.logFile
as a new parameter for logging these occurrences.New Function -
appendToLogFile
:appendToLogFile
, a utility function for appending messages to the log file. This function improves maintainability and includes a check to handle Groovy GStrings.Reason for Changes:
appendToLogFile
and the parameterization oflogFile
align with best coding practices, enhancing code reusability and organization.Impact:
Use Cases Addressed:
Incorrect Barcode in Sample Sheet:
Library Preparation Errors:
invalid_fastqs.log
file will assist in pinpointing these samples for reprocessing or further laboratory investigation.