williamfiset / Algorithms

A collection of algorithms and data structures
MIT License
17.12k stars 4.33k forks source link

Suffix Array LCS broken? #70

Closed williamfiset closed 5 years ago

williamfiset commented 5 years ago

Investigate the behavior of the LCS method under different k values to determine if it's working properly -- especially with long strings.

I haven't looked into the details yet, but assuming the initial approach is still correct and there's just a bug in the code (which is quite possible) it may be worth investigating to see if we can swap out the current SA implementation with the slower but more stable implementation. If that doesn't work, I can also swap out the sliding window minimum with a min range query segment tree to see if I get a different result.

williamfiset commented 5 years ago

Added Lcs.java and it seems to be working ATM. I will replace LongestCommonSubstring.java when sufficient testing has been done.

IsraaMa commented 5 years ago
Screen Shot 2019-04-01 at 2 58 24 PM

Here we can see the bug. The first part (left) shows the dataset used to test the algorithm. The second part (middle) shows the output of LongestCommonSubstring.java, the expected (correct) results are shown on the third box (right).

We can see that on k_value = 3,6 and 7 the output doesn't match the expected result.

Possible reasons of the bug: -Sliding window: When k = 6 the LongestCommonSubstring.java algorithm outputs "TA", but "TA" is not present in 6 strings, it is only present in 4 strings. The same happens with k = 7.

williamfiset commented 5 years ago

Should be working now! Please re-open if you find a breaking test case