Closed susheelbhanu closed 1 year ago
Thanks for reporting it, sorry for the trouble.
I never encountered this issue.
I am speculating that the input file is also picked up as output file by https://github.com/nf-core/ampliseq/blob/3b252d263d101879c7077eae94a7a3d714b051aa/modules/local/rename_raw_data_files.nf#L14
This might be because the sampleID D17_1
is the base of your forward name D17_1.fastq.gz
.
If that is true, changing your samplesheet from
D17_1 /hdd0/susbus/nf_core/data/hebe_16S/00.RawData/D17/D17_1.fastq.gz /hdd0/susbus/nf_core/data/hebe_16S/00.RawData/D17/D17_2.fastq.gz
to
D17 /hdd0/susbus/nf_core/data/hebe_16S/00.RawData/D17/D17_1.fastq.gz /hdd0/susbus/nf_core/data/hebe_16S/00.RawData/D17/D17_2.fastq.gz
should do the trick. Could you test that?
Yeah, I think that was the issue indeed. I renamed my input files 🙈, 'cos I like to make things complicated. I suppose the rename would have been the easiest and simplest option. Thanks for responding so quickly though.
I know someone already raised this but maybe an input file validation
step will help in future releases, to avoid these cases. I expect they might happen with replicate samples, those which one doesn't want to treat as run
in the input file.
Thanks again!
Yes, thanks, there is already some input validation going on and more is done in the next release, but I think this problem wouldn't be identified by any test! Potential solutions:
sampleID
is ending with _1
/_2
, and if yes, whether it is identical to file base name of forwardReads
& reverseReads
(before .fastq.gz) --> error message with request to change sampleID
. Will need to investigate further. Lets keep that issue open because its clearly a bug.
Great, thank you!
@d4straub quick question: I have reads from the Novogene where the barcode and the primer sequences were removed apparenlty.
When I run the following:
nextflow run nf-core/ampliseq -r 2.6.1 -profile singularity --input sample_hebe_edited.tsv --FW_primer GTGCCAGCMGCCGCGGTAA --RV_primer CCGTCAATTCCTTTGAGTTT --outdir "./hebe_16Sresults" --max_cpus 24 --max_memory 256.GB
I'm getting the below issue:
The following samples had too few reads (<1) after trimming with cutadapt:
Is it better to run the skip_cutadapt
or the retain_untrimmed
flags?
Thanks!
It might be also that you use the wrong primer sequences.
Is it better to run the skip_cutadapt or the retain_untrimmed flags?
If its fine to you to run ampliseq potentially twice, use --skip_cutadapt
first. If a large portion of reads (lets say, >10 or 15%) are removed due to being flagged as chimeric, use --retain_untrimmed -resume
instead of --skip_cutadapt
. That should reduce chimeric reads considerably.
Such a question would be better suited to be asked in the nf-core slack channel #ampliseq, see https://nf-co.re/join
Awesome, thank you. To my knowledge, I'm using the primer sequences the company provided. And thanks for the link the slack channel. Will use that going forward.
I added a fix linked above to dev branch, it will give a proper error message with the request to change the sampleID. Will be in the next release. SO I close that issue here.
Description of the bug
Hi,
I'm getting the below error on the
cutadapt
step:This is what my sample input file for the particular sample looks like
Incidentally, I looked in the
tmp
folder and it looks like therenaming/splitting
step is creating the additional files that cutadapt is getting confused with. See below:The relevant files are attached here: ampliseq.zip
Is there a way to get around this issue apart from renaming the original files from "_{1,2}" to "_R{1,2}"?
Thank you, Susheel
Command used and terminal output