seqan / seqan3

The modern C++ library for sequence analysis. Contains version 3 of the library and API docs.
https://www.seqan.de
Other
396 stars 81 forks source link

Question about large alphabet set #3220

Closed abde-ali-kagalwalla closed 6 months ago

abde-ali-kagalwalla commented 6 months ago

Platform

Question

Hi there,

I am new to this library and I am going through the tutorials to determine if I could use this for my application. For my application, I have a large alphabet set which would be more than 100 and could potentially exceed 8 bits. I went through the how to guide to create own alphabet (https://docs.seqan.de/seqan3/3-master-dev/howto_write_an_alphabet.html) and it mentions that the alphabet needs to provide an interface to convert to char object. But if the alphabet size exceeds char size, multiple alphabets could be assigned the same character. Would that cause issues if we try to align sequences with such a larger alphabet size?

smehringer commented 6 months ago

Hi @abde-ali-kagalwalla,

thanks for reaching out and sorry for the late reply.

But if the alphabet size exceeds char size, multiple alphabets could be assigned the same character. Would that cause issues if we try to align sequences with such a larger alphabet size?

Yes, you understood that correctly and yes it would cause trouble and would not be a good idea probably.

You can change the rank type of your alphabet to something that allows more. e.g. uint16_t. And then your alphabet does not allow a to char conversion. Depending what you want to do, a to_rank might be totally sufficient. For example our cigar alphabet has uint32_t as underlying rank type and a to_string function. seqan3::cigar

abde-ali-kagalwalla commented 6 months ago

Hi @smehringer,

Thank you very much for the pointer to seqan3::cigar. I think to_rank would be sufficient. My use-case is to create a custom alphabet with uint16_t, create a custom scoring matrix for the alphabet similar to BLOSUM matrices and then use that for for pairwise alignment of sequences created using the custom alphabet.

Thank you very much for the pointer, really appreciate it!

smehringer commented 6 months ago

You are very welcome. Feel free to reopen the issue or open a new one if more issues arise.