seqan / product_backlog

This repository is used as product backlog for all SeqAn relevant backlog items. This is intended to organise the work for the team.
2 stars 1 forks source link

uint64_t is not an alphabet? #200

Open marehr opened 4 years ago

marehr commented 4 years ago

Description

#include <seqan3/alphabet/all.hpp>

static_assert(seqan3::alphabet<uint64_t>);

fails to compile.

The code states the following reason:

 * \attention
 * Note that `uint64_t` is absent from the list, because there is no corresponding
 * character type.

Is that really an issue? We have at least the alphabet_variant alphabet that can have a larger rank than a char can represent.

Some problems: size of uint64_t is not representable in the same type => size = 0.

Acceptance Criteria

Tasks

- [ ] task 1 - [ ] task 2 - [ ] task 3 ### Definition of Done - [ ] Implementation and design approved - [ ] Unit tests pass - [ ] Test coverage = 100% - [ ] Microbenchmarks added and/or affected microbenchmarks < 5% performance drop - [ ] API documentation added - [ ] Tutorial/teaching material added - [ ] Test suite compiles in less than 30 seconds (on travis) - [ ] Changelog entry added
smehringer commented 4 years ago

Is that really an issue?

What's the result of seqan3::to_char(std::numeric_limits<uint64_t>::max()) gonna be then?

marehr commented 4 years ago

A truncated char. We can also just define it as a semi_alphabet.

marehr commented 3 years ago

Core-Meeting 2021-04-12:

We talked about some solution approaches, like changing the meaning of alphabet_size:

alphabet_size = uint16_t; // 256
rank_type = uint8_t
rank_type max_rank = 255; // numeric_limits::max
for (size_t rank = 0u; rank < alphabet_size; ++rank)

// be-aware: this is an endless-loop; case would be uint8_t for example
for (rank_type rank = 0u; rank <= max_rank; ++rank) // max_rank is constexpr, compiler should warn about endless-loop

for (alphabet && alph: enumerate_alphabet<alphabet_t>()) // 
{}

The compiler does not give any warning. (even though I saw one at some point in time)

Another approach would be to use compiler extensions, like [u]int128 to extend the size. The problem would be again having a generic way to iterate over the alphabet (enumerate_alphabet()).

marehr commented 3 years ago

This would allow utf-32 on 32bit machines. (And in general make our library compile on 32bit machines.)

marehr commented 3 years ago

Moved to Release 3.2.