Closed Statistic-Qin closed 1 year ago
There are a couple of ways to handle this.
You could modify the output to remove the ;size=
annotation with the option --xsize
in the command that writes the original file. Then you should drop the label_substr_match
option.
Perhaps you could alternatively use the label_word
option with fastx_getseqs
, like this:
vsearch --fastx_getseqs input.fasta --fastaout output.fasta --label_words otus.txt
The otus.txt
file should then contain the OTU labels (e.g. OTU_2
), one per line.
I hope this helps.
Assuming a fasta input:
>s1;size=2;
AAAAA
>s2;size=1;
AAAAT
vsearch
can read stdin
and write to stdout
, so it is possible to chain vsearch
operations as such:
printf ">s1;size=2;\nAAAAA\n>s2;size=1;\nAAAAT\n" | \
vsearch \
--fastx_filter - \
--quiet \
--xsize \
--fastaout - | \
vsearch \
--fastx_getseqs - \
--quiet \
--fastaout - \
--label_word "s1"
>s1
AAAAA
Thanks! In the first, I delete the sizeout option, there is no ";size=" string.
Perhaps you could alternatively use the label_word option
As suggested by @torognes the label_word
option is the best way to match labels (headers without annotations):
--label_word string Specify a word to match in the sequence header. Words are defined as strings delimited by either the start or end of the header or by any symbol that is not a letter (A-Z, a-z) or digit (0-9). The comparison is case-sensitive.
I've added regression tests to the vsearch
test suite https://github.com/frederic-mahe/vsearch-tests/commit/6a52f32f54f8a985bb9d63170757db0e008e9a13
@Statistic-Qin please close the issue if your problem has been solved.
After a series of treatment,I get the last fa-file which title has the "size=x"(picture 1).And I get the otu table file. I use linux options get the otu id names and save them in a text file(picture 2). I want to use the --fastx_getseqs function, but the entire header must match the id names, so I get zero sequence. When I add the --label_substr_match function, I find I get lots of sequences... Is there any way to solve this problem?
![image](https://user-images.githubusercontent.com/40289105/166138977-6ab11de6-192c-491a-9ca7-d8dfd853615c.png)