Closed ffranr closed 6 years ago
This unit tests captures the current functionality (which will be changed with this issue): https://github.com/iqbal-lab-org/gramtools/blob/51c58d6b08ef447229e52b5d4f94001effbd3733/libgramtools/tests/quasimap/test_quasimap.cpp#L236
This is not as trivial as I first thought. However, I'm getting there.
The non-trivial bit: can't just assign site-allele to an SA interval. The interval may contain indexes which are outside an allele. That is to say, handling reads which map exactly to (at least) two places: entirely within an allele and outside of a site altogether.
The code so far:
SearchStates handle_allele_encapsulated_state(const SearchState &search_state,
const PRG_Info &prg_info) {
SearchStates new_search_states = {};
bool cached_state = false;
SearchState new_search_state = {};
for (uint64_t sa_index = search_state.sa_interval.first;
sa_index <= search_state.sa_interval.second;
++sa_index) {
auto read_start_index = prg_info.fm_index[sa_index];
auto site_marker = prg_info.sites_mask[read_start_index];
bool read_within_site = site_marker != 0;
if (not read_within_site) {
if (cached_state) {
new_search_states.push_back(new_search_state);
cached_state = false;
}
continue;
}
auto allele_id = prg_info.allele_mask[read_start_index];
if (not cached_state) {
new_search_state = SearchState {
SA_Interval {sa_index, sa_index},
VariantSitePath {
VariantSite {site_marker, allele_id}
}
};
cached_state = true;
continue;
}
++new_search_state.sa_interval.second;
}
return new_search_states;
}
SearchStates handle_allele_encapsulated_states(const SearchStates &search_states,
const PRG_Info &prg_info) {
SearchStates new_search_states = {};
for (const auto &search_state: search_states) {
bool has_path = not search_state.variant_site_path.empty();
if (has_path) {
new_search_states.emplace_back(search_state);
continue;
}
SearchStates split_seach_states = handle_allele_encapsulated_state(search_state, prg_info);
for (const auto &split_search_state: split_seach_states)
new_search_states.emplace_back(split_search_state);
}
return new_search_states;
}
The above code is called like this:
new_search_states = handle_allele_encapsulated_states(new_search_states, prg_info);
@martinghunt Can you please provide an update to this issue given the latest changes (https://github.com/iqbal-lab-org/gramtools/commit/47e41d7691954eea17b195e49bbe39729e389f7d)?
@martinghunt says this issue can be closed.
This probably affects both kmers (keys to the kmer index) and reads.
Martin reported this on slack: