Open maxbachmann opened 2 years ago
Hi, will this algorithm be added recently?
I am not sure yet. I think I could add a simple implementation in the close future. When matching long sequences you would probably want to use a more optimized implementation like: https://github.com/jeffdaily/parasail (has python bindings) I do not think I will have the time to write an implementation that is even close to this level of optimization.
Got it, thanks for the great work!
After looking at the paraseil python bindings I found them way to hard to use. E.g.
Be careful using the attributes of the Result object - especially on Result instances constructed on the fly. For example, calling parasail.sw_trace("asdf", "asdf", 11, 1, parasail.blosum62).cigar.seq returns a numpy.ndarray that wraps a pointer to memory that is invalid because the Cigar is deallocated before the seq statement. You can avoid this problem by assigning Result instances to variables as in the example above.
Is not really acceptable in my opinion. So I decided to add at least a simple implementation for people who just want to have a properly working implementation.
The Smith Waterman algorithm is a commonly used metric to compare strings. It would be useful to add it to RapidFuzz.