rapidfuzz / rapidfuzz-rs

Rapid fuzzy string matching in Rust using various string metrics
https://docs.rs/rapidfuzz/latest/rapidfuzz/
Apache License 2.0
34 stars 2 forks source link

Min edit distance with sequence of variable length #5

Open G2GreenTea opened 9 months ago

G2GreenTea commented 9 months ago

I have a question that I need to ask you. There are two different sequences, A and B. I need to compare sequence A with sequence B to find the segment on sequence B that has the smallest editing distance from sequence A. If I take the length of sequence A as the window and move one distance on sequence B each time, the following problem will occur. 1703050602036

If the length of the fragments on sequence B is not fixed, for example, if the length distribution is [len (A) -2, len (A)+2], the results will be more accurate, but there will be significant performance loss. I would like to know if you would have a better way to solve this problem. Looking forward to your reply.

maxbachmann commented 9 months ago

For this use case you would probably want to use Smith Waterman. I have plans to add this to rapidfuzz for a while already: https://github.com/maxbachmann/RapidFuzz/issues/175 however I am not really sure when I will get around to it.