Closed crosenth closed 1 year ago
Chris, I think the design that Sam described placed each sample on a single row like this:
"sampleid,...,batch,R1,R2,I1,I2"
where we'd include columns for additional labels and metadata (need to look at what exactly the lab is providing). So no need to specify direction, and we'd include a path for each of R1 and R2.
We should also think about how we are specifying primer and adapter sequences - should it be here or in the config?
Great that works and any additional columns can be passed thru
I don't think I've heard the name "Malt" before, but I'll assume that it's the LIMS-associated software which is preparing the data generated by the sequencer to be run through dada2-nf
.
One approach which might be the least disruptive would be to give the user the option to provide the full samplesheet using the format which is being used currently (sampleid,batch,direction,fastq,index1,index2
). If that samplesheet is provided, then the manifest reconstruction step is skipped. However, all of your current automation could still stay in place as-is.
While I did mention sampleid,...,batch,R1,R2,I1,I2
as being my default samplesheet format, it's trivial to produce sampleid,batch,direction,fastq,index1,index2
as well. The only requirement for Cirro is to remove the behavior using fastq-file.txt
for inputs, which doesn't play well with object storage.
Re: primer and adapter sequences – my inclination would be to specify them in the global scope using params. If alternatively you see a need for using different primers for different samples, then it would make sense to specify them in the samplesheet.
Let's go with your default samplesheet format sampleid,...,batch,R1,R2,I1,I2
with no fastq-file.txt as Noah stated.
Yes Malt is our in-house LIMS application and we will program that from our end
Let's go with your default samplesheet format
sampleid,...,batch,R1,R2,I1,I2
with no fastq-file.txt as Noah stated.Yes Malt is our in-house LIMS application and we will program that from our end
Just wanted to make sure I'm clear -- do you like the "make no changes to the existing inputs" approach? Or do you want to go ahead and make breaking changes that will then require changes upstream in Malt?
Sam, what does the ,..., mean for the samplesheet format?
I copied that from Noah's comment -- I assumed he was referring to additional columns with unspecified content
That's right ... just indicates additional columns that will be ignored but included in the sample sheet copied to the output directory.
Since most of the time the fastqs are all in the same directory, I'm considering a convention of
sampleid,...,batch,datadir,R1,R2,I1,I2
If datadir has a value, it will be used as the parent directory for whatever path (typically just a filename) is provided for each of the files. Any thougts on this? Cirro can always modify is necessary.
That sounds fine to me, @nhoffman
Before I dive in, I'd like to make sure that we're all on the same page for the other points which have come up:
@sminot -
That sounds great. I'm making my changes on https://github.com/nhoffman/dada2-nf/tree/cloud_compatibility. I'll make a PR once the test run to completion in Cirro
Current process is a manifest with "sampleid,batch" columns with Malt creating an additional fastq-file.txt file with fastq paths. The pipeline has a starting channel shape of "sampleid,batch,direction,fastq,index1,index2". Still need to check in with Sam on what is best for Cirro. If Malt and Cirro can generate the required channel shape "sampleid,batch,direction,fastq,index1,index2" that would simplify the pipeline.