Open samuell opened 1 year ago
That would be achieved, but is tblastn simpler and faster?
That would be achieved, but is tblastn simpler and faster?
Perhaps! In my own quick try, it seemed that I need to put my query sequence into a file before running it, but there is perhaps some way to do this more easily.
I can explore this option a little more.
Prerequisites
seqkit version
Describe your issue
I'm having the usecase where I located a small "motif" in a protein sequence, that I'm interested in finding again in the nucleotide sequence coding for the protein.
The sequence I was looking for, expressed as a regex is the following, so let's use that as an example here (
.
is of course any letter, as per standard regex syntax):I would now want to be able to
seqkit grep
against not only protein sequences, but also nucleotide ones.By using a genetic code table I can do this by manually converting this sequence into a (DNA) nucleotide regex like this one (where
[XY]
are character classes allowing any ofX
andY
in one position):Now, it would be useful to not need to do this translation manually, but rather be able to do something similar to:
Of course, the similar thing could be done using degenerate amino acid / bases too, if that is preferred over regular expressions.