Closed Shellfishgene closed 1 year ago
Hi, thanks for pointing out this issue. It's difficult to say what is happening here without the actual sequences. Could you send me the error_seq.fq
file? I've got the SILVA file.
Since the probability for Eukaroyta in the example is just 0.63 and the cutoff is 0.60, I suspect that there is some random variation in the results because the SINTAX algorithm has some built-in randomness. That could cause a match in some runs and no match in other runs. You might get different results if you run it again.
See the sequence below. Independently of the cutoff, is it expected behaviour that vsearch outputs only the ID when there is no match? I noticed this because usearch's sintax_summary
command can't deal with that.
>A00808:1162:HGN3HDRX2:1:2268:21649:2973 1:N:0:AGACCTTG+GATGCTAC
AGCCAATTAAGATCCCAACTGGTTCACGTGGCTCACACTCCTACAACATGTTCTGTTCAGAGTATTTCAAGTCAGGTGAGAACCCTGATAATGTTTTCAAACACTATAAGGACAGATTATTTACATGCATTATAACTATTATAGACCATGGCTAAAATATAGGGTAACATT
Yes, that is expected behaviour, the ID followed by two empty columns, separated by tabs. It seems like usearch may always or more often give a match, no matter the confidence.
I see, then I was just confused by #493 stating there should always be four columns also with no match. If this is the expected behaviour, I'll close this. Thanks!
Sorry, when there is no match, you'll have the ID in the first column followed by two or three empty columns. There will be a total of four columns if the --sintax_cutoff
option was used, otherwise a total of three columns.
tests covering that issue (https://github.com/frederic-mahe/vsearch-tests/commit/9465796f7151b65c8b41e8f13d8064c90ecaa396)
Hi!
The sintax command sometimes outputs only the ID column for some of my sequences.
The minimal example given in #493 also produces only the ID in
v2.22.1_linux_x86_64
. However that's for no match, in the example above usearch produces a match at least for Eukaryota.