seqan / seqan3

The modern C++ library for sequence analysis. Contains version 3 of the library and API docs.
https://www.seqan.de
Other
409 stars 83 forks source link

Alphabet conversion is not properly constrained #3267

Closed rrahn closed 3 months ago

rrahn commented 3 months ago

Does this problem persist on the current main?

Is there an existing issue for this?

Current Behavior

Explicit casting of a bitpacked proxy type to its actual value type causes a compile time error. See #3264 for more details.

Expected Behavior

It compiles.

Steps To Reproduce

#include <vector>

#include <seqan3/alphabet/container/bitpacked_sequence.hpp>
#include <seqan3/alphabet/nucleotide/dna4.hpp>

using namespace seqan3::literals;

seqan3::bitpacked_sequence<seqan3::dna4> source{"ACGGTCAGGTTC"_dna4};
auto it = source.begin();
seqan3::dna4 val = seqan3::dna4{*it}; // compile error - due to explicit conversion

Environment

- Operating system: macOS 14.5
- SeqAn version: main
- Compiler: gcc 13.3

Anything else?

One problem is this code: https://github.com/seqan/seqan3/blob/ca4d668390e35b4045ccd02d070927f8178ed2ce/include/seqan3/alphabet/detail/convert.hpp#L32-L44

Here, we only require in_t to be an alphabet type but this does not include that in_t is also default constructible. However, inside of the initialization of the conversion table the default constructor of in_t is called, causing the seen compile error.

To fix this the in_t must be constrained with std::default_initializable as well.

eseiler commented 3 months ago
diff --git a/include/seqan3/alphabet/detail/convert.hpp b/include/seqan3/alphabet/detail/convert.hpp
index c3ebda2a5..8405d6a9a 100644
--- a/include/seqan3/alphabet/detail/convert.hpp
+++ b/include/seqan3/alphabet/detail/convert.hpp
@@ -30,6 +30,7 @@ namespace seqan3::detail
  * \hideinitializer
  */
 template <alphabet in_t, alphabet out_t>
+    requires std::default_initializable<in_t>
 constexpr std::array<out_t, alphabet_size<in_t>> convert_through_char_representation
 {
     []() constexpr {
diff --git a/include/seqan3/alphabet/nucleotide/nucleotide_base.hpp b/include/seqan3/alphabet/nucleotide/nucleotide_base.hpp
index f1c7bb054..5f97b42bc 100644
--- a/include/seqan3/alphabet/nucleotide/nucleotide_base.hpp
+++ b/include/seqan3/alphabet/nucleotide/nucleotide_base.hpp
@@ -77,6 +77,7 @@ public:
     template <typename other_nucl_type>
         requires (!std::same_as<nucleotide_base, other_nucl_type>)
               && (!std::same_as<derived_type, other_nucl_type>) && nucleotide_alphabet<other_nucl_type>
+              && std::default_initializable<other_nucl_type>
     explicit constexpr nucleotide_base(other_nucl_type const & other) noexcept
     {
         static_cast<derived_type &>(*this) =

should work.

Unfortunately, it's not enough to constrain the array (actually, we don't have to), we also have to constrain the ctor.

We probably have this ctor in multiple alphabets... Maybe there is a way to construct the array if in_t is not default initializable?

eseiler commented 3 months ago

Without confirming via debugging, I figure that this is the implicit conversion:

https://github.com/seqan/seqan3/blob/ca4d668390e35b4045ccd02d070927f8178ed2ce/include/seqan3/alphabet/detail/alphabet_proxy.hpp#L173-L201

seqan::bitpacked_sequence::reference_proxy_type inherits from alphabet_proxy.

https://github.com/seqan/seqan3/blob/ca4d668390e35b4045ccd02d070927f8178ed2ce/include/seqan3/alphabet/nucleotide/nucleotide_base.hpp#L77-L84

is the explicit ctor.

By constraining this one, we fall back to implicit conversion (Unless I'm mixing up conversion/casting rules).

rrahn commented 3 months ago

I can confirm these findings. And correctly constraining the explicit constructor should do it. To fix this, I am planning to add a detail concept called convertable_through_char_representation that should be used for these constructors.

eseiler commented 3 months ago

Something like that would also work, though it's not that nice...

diff --git a/include/seqan3/alphabet/detail/alphabet_proxy.hpp b/include/seqan3/alphabet/detail/alphabet_proxy.hpp
index 7be8044a8..953a59cd4 100644
--- a/include/seqan3/alphabet/detail/alphabet_proxy.hpp
+++ b/include/seqan3/alphabet/detail/alphabet_proxy.hpp
@@ -133,6 +133,8 @@ public:
     //!\brief The alphabet size.
     static constexpr auto alphabet_size = seqan3::alphabet_size<alphabet_type>;

+    using alphabet_t = alphabet_type;
+
     /*!\name Write functions
      * \brief All of these call the emulated type's write functions and then delegate to
      *        the assignment operator which invokes derived behaviour.
diff --git a/include/seqan3/alphabet/detail/convert.hpp b/include/seqan3/alphabet/detail/convert.hpp
index c3ebda2a5..0c172be14 100644
--- a/include/seqan3/alphabet/detail/convert.hpp
+++ b/include/seqan3/alphabet/detail/convert.hpp
@@ -30,18 +30,19 @@ namespace seqan3::detail
  * \hideinitializer
  */
 template <alphabet in_t, alphabet out_t>
-constexpr std::array<out_t, alphabet_size<in_t>> convert_through_char_representation
-{
-    []() constexpr {
+constexpr std::array<out_t, alphabet_size<in_t>> convert_through_char_representation{
+    []() constexpr
+    {
         std::array<out_t, alphabet_size<in_t>> ret{};

-        // for (decltype(alphabet_size<in_t>) i = 0; ...) causes indefinite compilation :(
-        for (auto i = decltype(alphabet_size<in_t>){0}; i < alphabet_size<in_t>; ++i)
-            assign_char_to(to_char(assign_rank_to(i, in_t{})), ret[i]);
+        static_assert(std::default_initializable<in_t> || requires { typename in_t::alphabet_t; });
+        using value_t = std::conditional_t<std::default_initializable<in_t>, in_t, typename in_t::alphabet_t>;
+
+        for (auto i = decltype(alphabet_size<value_t>){0}; i < alphabet_size<value_t>; ++i)
+            assign_char_to(to_char(assign_rank_to(i, value_t{})), ret[i]);

         return ret;
-    }()
-};
+    }()};
 // clang-format on

 } // namespace seqan3::detail
rrahn commented 3 months ago

Sure, that seems valid. However, it feels too specific to seqan3. In general, any third party alphabet proxy should work as well but now we need to constraint that the proxy type exposes the alphabet_t type for which in_t is a proxy for. And then we would also need to check whether that type is also default_initializable, since alphabets to not require this.