seqan / seqan3

The modern C++ library for sequence analysis. Contains version 3 of the library and API docs.
https://www.seqan.de
Other
406 stars 81 forks source link

Question regarding the usage of `seqan3::fm_index_cursor` using `seqan3::bitpacked_sequence`. #3264

Closed Dhruv-mak closed 1 month ago

Dhruv-mak commented 1 month ago

Platform

Question

I am relatively new to C++ and I'm currently learning to use the SeqAn3 library. I've been following the documentation for the fm_index, but I encountered an issue that I cannot resolve.

Here's a code Snippet:

#include <seqan3/argument_parser/all.hpp>
#include <seqan3/core/debug_stream.hpp>
#include <seqan3/alphabet/nucleotide/dna5.hpp>
#include <seqan3/core/debug_stream.hpp>
#include <seqan3/io/sequence_file/input.hpp>
#include <seqan3/search/fm_index/fm_index.hpp>
#include <string>
#include <vector>
#include <seqan3/alphabet/container/bitpacked_sequence.hpp>
#include <fstream>

int main(int argc, char const** argv) {
    using namespace seqan3::literals;

    seqan3::bitpacked_sequence<seqan3::dna4> seq{"ACGGTCAGGTTC"_dna4};
    seqan3::fm_index index{seq};

    // I don't know why the normal slicing is not working???
    // auto search(seq.begin(), seq.begin() + 4);
    auto search = seqan3::views::slice(seq, 0, 4);

    auto cursor = index.cursor();
    cursor.extend_right(search);
}

Steps to Reproduce

Expected Behavior

Observed Behavior

rrahn commented 1 month ago

Hi @Dhruv-mak,

thanks for reporting this. We will look into it.

SGSSGene commented 1 month ago

The error message this code produces is:

In file included from ../seqan3/include/seqan3/alphabet/nucleotide/nucleotide_base.hpp:16,
                 from ../seqan3/include/seqan3/alphabet/nucleotide/dna5.hpp:17,
                 from main.cpp:3:
../seqan3/include/seqan3/alphabet/detail/convert.hpp: In instantiation of ‘constexpr const std::array<seqan3::dna4, 4> seqan3::detail::convert_through_char_representation<seqan3::bitpacked_sequence<seqan3::dna4>::reference_proxy_type, seqan3::dna4>’:
../seqan3/include/seqan3/alphabet/nucleotide/nucleotide_base.hpp:86:21:   required from ‘constexpr seqan3::nucleotide_base<derived_type, size>::nucleotide_base(const other_nucl_type&) [with other_nucl_type = seqan3::bitpacked_sequence<seqan3::dna4>::reference_proxy_type; derived_type = seqan3::dna4; auto size = 4]’
../seqan3/include/seqan3/alphabet/nucleotide/dna4.hpp:78:19:   required from ‘bool seqan3::fm_index_cursor<index_t>::extend_right(seq_t&&) [with seq_t = std::ranges::subrange<seqan3::detail::random_access_iterator<seqan3::bitpacked_sequence<seqan3::dna4> >, seqan3::detail::random_access_iterator<seqan3::bitpacked_sequence<seqan3::dna4> >, std::ranges::subrange_kind::sized>&; index_t = seqan3::fm_index<seqan3::dna4, seqan3::single, sdsl::csa_wt<sdsl::wt_pc<sdsl::balanced_shape, sdsl::int_vector<1>, sdsl::rank_support_v<1, 1>, sdsl::select_support_scan<>, sdsl::select_support_scan<0>, sdsl::byte_tree<> >, 16, 10000000, sdsl::sa_order_sa_sampling<>, sdsl::isa_sampling<>, sdsl::plain_byte_alphabet> >]’
main.cpp:23:24:   required from here
../seqan3/include/seqan3/alphabet/detail/convert.hpp:43:50: error: use of deleted function ‘seqan3::bitpacked_sequence<alphabet_type>::reference_proxy_type::reference_proxy_type() [with alphabet_type = seqan3::dna4]’
   43 |             assign_char_to(to_char(assign_rank_to(i, in_t{})), ret[i]);
      |                                    ~~~~~~~~~~~~~~^~~~~~~~~~~
In file included from main.cpp:9:
../seqan3/include/seqan3/alphabet/container/bitpacked_sequence.hpp:105:9: note: declared here
  105 |         reference_proxy_type() = delete;
      |         ^~~~~~~~~~~~~~~~~~~~

@Dhruv-mak: the seqan3::fm_index does not benefit from using seqan3::bitpacked_sequence<seqan3::dna4>. If you have no requirement to use it I suggest using std::vector<seqan3::dna4> instead.

rrahn commented 1 month ago

I can confirm the following error:

In file included from ../seqan3-src/include/seqan3/alphabet/nucleotide/nucleotide_base.hpp:13,
                 from ../seqan3-src/include/seqan3/alphabet/nucleotide/dna15.hpp:14,
                 from ../seqan3-src/test/unit/alphabet/container/bitpacked_sequence_test.cpp:12:
../seqan3-src/include/seqan3/alphabet/detail/convert.hpp: In instantiation of 'constexpr const std::array<seqan3::dna4, 4> seqan3::detail::convert_through_char_representation<seqan3::bitpacked_sequence<seqan3::dna4>::reference_proxy_type, seqan3::dna4>':
../seqan3-src/include/seqan3/alphabet/nucleotide/nucleotide_base.hpp:83:21:   required from 'constexpr seqan3::nucleotide_base<derived_type, size>::nucleotide_base(const other_nucl_type&) [with other_nucl_type = seqan3::bitpacked_sequence<seqan3::dna4>::reference_proxy_type; derived_type = seqan3::dna4; auto size = 4]'
../seqan3-src/include/seqan3/alphabet/nucleotide/dna4.hpp:75:19:   required from here
../seqan3-src/include/seqan3/alphabet/detail/convert.hpp:40:50: error: use of deleted function 'seqan3::bitpacked_sequence<alphabet_type>::reference_proxy_type::reference_proxy_type() [with alphabet_type = seqan3::dna4]'
   40 |             assign_char_to(to_char(assign_rank_to(i, in_t{})), ret[i]);
      |                                    ~~~~~~~~~~~~~~^~~~~~~~~~~
In file included from ../seqan3-src/test/unit/alphabet/container/bitpacked_sequence_test.cpp:10:
../seqan3-src/include/seqan3/alphabet/container/bitpacked_sequence.hpp:102:9: note: declared here
  102 |         reference_proxy_type() = delete;

produced by the minimal example:

  using namespace seqan3::literals;

  seqan3::bitpacked_sequence<seqan3::dna4> source{"ACGGTCAGGTTC"_dna4};
  auto it = source.begin();
  seqan3::dna4 val = static_cast<seqan3::dna4>(*it);

The explicit cast of the proxy type returned by bitpacked sequence to a normal alphabet will call internally the default constructor of this proxy type causing the compile error, since the proxy type is not default constructible. So this needs some further digging into the alphabet mechanics which may take a while. For the time being, I suggest not using the bitpacked_sequence. Note internally the FM index uses the SDSL and this stores the given sequence in its own memory representation. So there is literally no benefit for using the bitpacked sequence if only the FM index is needed.

rrahn commented 1 month ago

Interestingly, this compiles:

using namespace seqan3::literals;

seqan3::bitpacked_sequence<seqan3::dna4> source{"ACGGTCAGGTTC"_dna4};
auto it = source.begin();
seqan3::dna4 val = *it;

So implicit conversion seems to work but via a different code path. Maybe some general issue with the alphabet_proxy. I'll investigate.