tacitvenom / genomics_algo

MIT License
1 stars 1 forks source link

Frequent Words with Mismatches #21

Closed tacitvenom closed 3 years ago

tacitvenom commented 3 years ago

Find the most frequent k-mers with mismatches in a string.

Input: A string Text as well as integers k and d. (You may assume k ≤ 12 and d ≤ 3.) Output: All most frequent k-mers with up to d mismatches in Text.

tacitvenom commented 3 years ago

FrequentWordsWithMismatches(Text, k, d) Patterns ← an array of strings of length 0 freqMap ← empty map n ← |Text| for i ← 0 to n - k Pattern ← Text(i, k) neighborhood ← Neighbors(Pattern, d) for j ← 0 to |neighborhood| - 1 neighbor ← neighborhood[j] if freqMap[neighbor] doesn't exist freqMap[neighbor] ← 1 else freqMap[neighbor] ← freqMap[neighbor] + 1 m ← MaxMap(freqMap) for every key Pattern in freqMap if freqMap[Pattern] = m append Pattern to Patterns return Patterns

SvoONs commented 3 years ago

I'd like to try to contribute here. Do you usually assume set(Text) = {"A", "C", "G", "T"}?

tacitvenom commented 3 years ago

Great! Yep, that's right.