Open rrahn opened 4 years ago
Some questions in the core meeting:
2020-06-18:
We discussed whether to make the template parameter for (un-) compressed
easy to set.
#include <seqan3/alphabet/nucleotide/dna4.hpp>
#include <seqan3/core/debug_stream.hpp>
#include <seqan3/range/views/kmer_hash.hpp>
#include <seqan3/search/dream_index/technical_binning_directory.hpp>
using seqan3::operator""_dna4;
int main()
{
seqan3::ibf_config cfg{seqan3::bin_count{8u},
seqan3::bin_size{1ULL<<16},
seqan3::hash_function_count{2u}};
std::vector<std::vector<seqan3::dna4>> technical_bins{"ACTGACTGACTGATC"_dna4,
"GTGACTGACTGACTCG"_dna4,
"AAAAAAACGATCGACA"_dna4};
auto v = seqan3::views::kmer_hash(seqan3::ungapped{5u});
seqan3::technical_binning_directory tmp{technical_bins, std::move(v), cfg};
// :(
seqan3::technical_binning_directory<decltype(v), seqan3::data_layout::compressed> tbd{tmp};
auto & result = tbd.bulk_contains(0);
seqan3::debug_stream << result << '\n'; // [0,0,1,0,0,0,0,0]
}
Possible Solutions:
technical_binning_directory.compress
which creates a compressed binning directory
seqan3::technical_binning_directory tmp{technical_bins, std::move(v), cfg};
//before
seqan3::technical_binning_directory<decltype(v), seqan3::data_layout::compressed> tbd{tmp};
// after
auto && compressed_tbd = tmp.compress();
basic_technical_binning_directory<view_adaptor_t, compressed/uncompressed>
and two typestechnical_binning_directory<view_adaptor_t> = basic_technical_binning_directory<view_adaptor_t, uncompressed>
compressed_technical_binning_directory<view_adaptor_t> = basic_technical_binning_directory<view_adaptor_t, compressed>
seqan3::technical_binning_directory tmp{technical_bins, std::move(v), cfg};
//before
seqan3::technical_binning_directory<decltype(v), seqan3::data_layout::compressed> tbd{tmp};
// after
seqan3::compressed_technical_binning_directory compressed_tbd = std::move(tbd);
// or
seqan3::compressed_technical_binning_directory compressed_tbd{technical_bins, std::move(v), cfg};
Resolution:
technical_binning_directory
as it is. Let's see how it will look like if we have to use this data structure.Thought dump:
Maybe a seqan3::binning_directory
is the wrong solution. Want we want is a convenient way to build the seqan3::interleaved_bloom_filter
from sequences given a configuration.
Maybe a seqan3::ibf_factory
is a better way of providing this. It will get a configuration and construct an ibf from it.
Description
Build an IBF over the k-mers of the technical bins. Given this IBF allow to search for a query and report all bins that cover at least one k-mer of the query.
Additional resources: https://trello.com/c/Q8HFn4RO/64-protocol-of-the-meeting-09032020-missing https://docs.google.com/document/d/1LxDRXm4kLMuFYyRBnml4uZttOUIzu3eCsj78jfODz2g
Acceptance Criteria
Tasks
- [ ] task 1 - [ ] task 2 - [ ] task 3 ### Definition of Done - [ ] Implementation and design approved - [ ] Unit tests pass - [ ] Test coverage = 100% - [ ] Microbenchmarks added and/or affected microbenchmarks < 5% performance drop - [ ] API documentation added - [ ] Tutorial/teaching material added - [ ] Test suite compiles in less than 30 seconds (on travis) - [ ] Changelog entry added