Open julochrobak opened 10 years ago
As tempting as it looks, I would rather not introduce the State property for the sake of performance. It has been a while, but if you are still interested in optimizing the approximate string matching, there is a different way to approach it. Instead of using len(s)*len(t)
space to store the matrix D
, you can do away with just 2 * min(len(s), len(t))
by keeping only two rows or columns (whichever is shorter). Here is the main loop over the matrix:
for j := 1; j < n; j++ {
for i := 1; i < m; i++ {
if s[i-1] == t[j-1] {
d[i][j] = d[i-1][j-1] // no operation required
} else {
d[i][j] = min(
d[i-1][j]+1, // a deletion
d[i][j-1]+1, // an insertion
d[i-1][j-1]+1) // a substitution
}
}
}
Note that the indexes only go back as far as i-1
or j-1
.
This improves the performance of the fuzzy function by avoiding allocation of the matrix on each call. In other words, there is a matrix allocated on the first call and reused on the following calls. If bigger matrix is required it is recreated.
In order to allow parallel execution we cannot use a single global variable to store the matrix. I've introduced a State property to the Func structure which allows every function to have a state and reuse it among the calls.
Another small performance improvement is to use own min/max function and avoid the built-in Min/Max on floats.