torognes / vsearch

Versatile open-source tool for microbiome analysis
Other
656 stars 122 forks source link

Add option Sample like in Usearch #426

Closed tiagobrc closed 2 years ago

tiagobrc commented 3 years ago

Is there a way in Vsearch to add sample identifiers to read labels just like in usearch (using the option -sample)?

I am trying to use vsearch but this option does not seem to exist.

Can you please confirm to me that is the case? Or point me to the equivalent parameter in vsearch?

Usearch explanation of the function:

` Adding sample identifiers to read labels If multiple samples are combined into a single file as shown in some of the above examples, then you lose track of which read came from which sample. This is addressed by adding a sample identifier to each read label. The simplest method is to use the -sample option, e.g.

`

Cordially,

Tiago

colinbrislawn commented 3 years ago

Hello Tiago,

Check out the --relabel and --label_suffix options. You could also change read labels using the linux sed command.

Alternatively, adding the --sample flag to vsearch could be helpful.

Colin

frederic-mahe commented 3 years ago

Hi @tiagobrc

with vsearch you can read from streams and write to streams, which allows for very flexible and powerful pipelines. For instance, you can merge many fastq samples while retaining the name of the original sample in sequence headers:

# create individual samples
for SAMPLE_NAME in sample{1..9}.fastq ; do
    printf "@s\nA\n+\nI\n" > "${SAMPLE_NAME}"
done

# pool fastq samples, retain sample names in fastq headers
for SAMPLE_NAME in sample_*.fastq ; do
    vsearch \
        --quiet \
        --fastx_filter "${SAMPLE_NAME}" \
        --relabel "${SAMPLE_NAME/.fastq/}_" \
        --fastqout -
done > all_samples.fastq

# clean up
rm sample{1..9}.fastq all_samples.fastq

You will end up with fastq entries formatted as such: "@SampleName_EntryNumber"

torognes commented 2 years ago

I think the vsearch (and usearch) option --label_suffix sample=";sample=SampleA;" is equivalent to the usearch option -sample SampleA. This option can currently be used with the fastq_mergepairs and fastx_revcomp commands.

In principle the --label_suffix option could be used with almost any command that writes FASTA or FASTQ files, so I will enable it with many more commands in the next release.

I'll consider adding the --sample option and perhaps the use of the @ symbol with the --relabel option as well.

torognes commented 2 years ago

The --sample option has been added in commit 34df253533db53e5d2fe0f91395a750e9c7f5862. The new option and the --label_suffix option has been enabled for all commands that write FASTA or FASTQ files.

torognes commented 2 years ago

Added in version 2.21.0 just released