Closed abde-ali-kagalwalla closed 6 months ago
Hi @abde-ali-kagalwalla,
thanks for reaching out and sorry for the late reply.
But if the alphabet size exceeds char size, multiple alphabets could be assigned the same character. Would that cause issues if we try to align sequences with such a larger alphabet size?
Yes, you understood that correctly and yes it would cause trouble and would not be a good idea probably.
You can change the rank type of your alphabet to something that allows more. e.g. uint16_t
. And then your alphabet does not allow a to char conversion. Depending what you want to do, a to_rank
might be totally sufficient. For example our cigar
alphabet has uint32_t
as underlying rank type and a to_string
function. seqan3::cigar
Hi @smehringer,
Thank you very much for the pointer to seqan3::cigar
. I think to_rank
would be sufficient. My use-case is to create a custom alphabet with uint16_t
, create a custom scoring matrix for the alphabet similar to BLOSUM matrices and then use that for for pairwise alignment of sequences created using the custom alphabet.
Thank you very much for the pointer, really appreciate it!
You are very welcome. Feel free to reopen the issue or open a new one if more issues arise.
Platform
Question
Hi there,
I am new to this library and I am going through the tutorials to determine if I could use this for my application. For my application, I have a large alphabet set which would be more than 100 and could potentially exceed 8 bits. I went through the how to guide to create own alphabet (https://docs.seqan.de/seqan3/3-master-dev/howto_write_an_alphabet.html) and it mentions that the alphabet needs to provide an interface to convert to
char
object. But if the alphabet size exceedschar
size, multiple alphabets could be assigned the same character. Would that cause issues if we try to align sequences with such a larger alphabet size?