Closed sanjaynagi closed 1 year ago
Hi Sanjay, I'll get right on this
Any luck with this yet @ChabbyTMD ?
Hi Sanjay,
I've hit a bit of a snag with this. bcl-convert produces fastq reads in the format:
Good point Trevor, I had completely forgotten that we need to rename the files. Let me find my old commands for how I did this.
Try these inside a shell command. It may be necessary to install rename
or add it to the conda environment.
shell:
"""
rename s/S[[:digit:]]\+_L001_R// path/to/fastqs/* " 2> {log}
rename s/_001// path/to/fastqs/* 2>>{log}
""
Had to look this up, but the first command gets us to somewhere like this sample95_1_001.fastq.gz
and the second command removes that final _001
. This is tricky in general and we have to hope files are always named like this. Its a bit poor that we cant customise the output of BCL-convert but nevermind.
In terms of the snakemake input, I suggest we just designate the input as a whole directory results/reads
, then the output can be something like expand("{sampleID}_{n}.fastq.gz", sampleID=samples, n=[1,2]). It may mean that the output of the bcl-convert rule is directory("results/reads/")
.Btw, that rule should not have the run statement as that's for running python code, just use shell instead.
Also, because the reads are now something we generate as part of the workflow, I think we should move the resources/reads
folder to results/reads
(throughout the workflow). This is useful as when you want to start the workflow from the beginning, you can just delete or move the whole results/ folder
FYI, Ive added information on how to contribute to AmpSeeker in the README as its been a while :) Thanks Trevor.
Need a rule at the start of the workflow to convert and demultiplex BCL files from the Illumina miseq output directory to fastq. The command should be something like -