Closed marchoeppner closed 2 months ago
...and just adding "--sample SAMPLE_ID" did the job...closing:
vsearch --fastq_merge $fwd --reverse $rev \\
--fastqout $merged \\
--threads ${task.cpus} \\
--fastq_eeout \\
-relabel ${meta.sample_id}.
--sample ${meta.sample_id}
I am glad to hear that you found a solution!
As a follow up:
--sample
(see 31b328ab59cc142f4f9cb080d7cf16c410e4f45b)--otutabout
's behavior when --sample
is missing remains undocumented (use sequence identifier, buts truncates after most punctuation characters, except _
)
Hi,
I am recreating a pipeline/workflow I found a little while ago, using Vsearch, which included an interesting "trick" to get the OTU counts per sample via usearch_global for all samples at once. However, I have noticed that usearch_global messes up my sample IDs if these contain a dash/hyphen - maybe because the way I am doing this isn't even considered by the developer.
The basic logic goes as follows:
(trimming, primer site remove outside of Vsearch)
For each library:
Note that I am attaching the sample ID to the fastq file via "-relabel"
And then for all samples combined:
-> Dreplicate -> Cluster Unoise -> Uchime3 Denovo -> Cluster size
I then proceed to quantify my samples against the OTU set as follows:
where $fastq is the combined set of all reads as emitted by the individual -fastq_merge steps.
And this it where it does wrong, because --usearch_global clips my sample IDs:
Fastq header after "-fastq_merge": @MS-A1.1;ee=0.03493
Which --usearch_global turns into "MS", i.e. deletes everything after the "-"
Since I have many samples with similar names (MS-A1, MS-A2, etc) I am ending up with only a single count column "MS".
Bug, feature or am I not supposed to do it this way? ;)