The given example shows how to transform word1 to make word2. It is about mutation from one state to another. The goal is to count a series of transformations.
Example 1: "horse" to "ros"
horse -> rose -> ros
return 3
Example 2: "intention" to "execution"
intention -> inention -> enention -> exention -> exection
return 5
The description does not specify the type of edit distance and does not require background knowledge though, it looks like Levenshtein Distance.
Possible operations are Insert, delete and replace. It is noticed that replace is considered as one step. Without it, two steps are required - first delete and replace. To allow replace implies that it shortens the distance, just one step of the operation is enough.
constraint:
0 <= word1.length, word2.length <= 500
word1 and word2 consist of lowercase English letters.
Approach
Tabular, bottom up DP. Break down the problem into a small problem. Solve the smallest problem in order and record it into a 2D array (matrix). The values in the cell stores the distance between the substrings of word1 and word2. It allows us to reuse the previous step to get another solution for a longer substring.
Tabulation:
word1="horse", word2="ros"
r o s
0 1 2 3
h 1 1 2 3
o 2 2 1 2
r 3 2 2 2
s 4 3 3 2
e 5 4 4 3
As an example, I've referenced a python code suggested by Tarun Nayak(N7_BLACKHAT) and a little bit of giving modification just on the coding style.
class Solution:
def minDistance(self, word1: str, word2: str) -> int:
if not word1 and not word2:
return 0
len_word1 = len(word1) if word1 else 0
len_word2 = len(word2) if word2 else 0
# word1: row, word2: col
cost = [[0] * (len_word2 + 1) for _ in range(len_word1 + 1)]
# first column: The cost of change without word2
for r in range(1, len_word1+1):
cost[r][0] = r
# first row: The cost of change without word1
for c in range(1, len_word2+1):
cost[0][c] = c
for r in range(1, len_word1+1):
for c in range(1, len_word2+1):
if word1[r-1] == word2[c-1]:
cost[r][c] = cost[r-1][c-1]
else:
cost[r][c] = min(
cost[r-1][c], # Insert
cost[r][c-1], # Delete
cost[r-1][c-1] # Replace
) + 1
return cost[len_word1][len_word2]
Performance
The solution beats around 30% to 50% different from time to time, it may not an optimal solution. I also had a concern about the dp table, cost. Considering the string length 0 <= word1.length, word2.length <= 500, strings from other test cases can be much longer than the one in the example. It could be much longer up to 500. In worst cases, it will require 501 x 501 = 251,001 elements. Python uses 28 bytes for an integer, 28 bytes x 251,001 elements = 7,028,028 bytes. Is that amount of auxiliary space required all the way through?
Solution
I've look through discussions and found a comment from a user, Maksim Shekhunov.
You don't need to store all dp array.
Based on his suggestion and the example, I've rewritten the code by replacing the cost (m * n matrix) to two lists, prev and curr. It performed way better not only its memory space use but also in computation time.
def minDistance(self, word1: str, word2: str) -> int:
if not word1 and not word2:
return 0
len_word1 = len(word1) if word1 else 0
len_word2 = len(word2) if word2 else 0
prev = list(range(len_word2+1))
cur = [0] * (len_word2 + 1)
for r in range(1, len_word1+1):
cur[0] = r
for c in range(1, len_word2+1):
if word1[r-1] == word2[c-1]:
cur[c] = prev[c-1]
else:
cur[c] = min(
prev[c], # Insert
cur[c-1], # Delete
prev[c-1] # Replace
) + 1
prev = cur
cur = [0] * (len_word2 + 1)
return prev[-1]
Complexity
Full Tabular:
Time complexity: O(mn)
Space complexity: O (mn)
Optimized:
Time complexity: O(mn)
Space complexity: O (n)
Conclusion
"You don't need to store all dp array." - Maksim Shekhunov
Intuition
The given example shows how to transform
word1
to makeword2
. It is about mutation from one state to another. The goal is to count a series of transformations.Example 1: "horse" to "ros" horse -> rose -> ros return 3
Example 2: "intention" to "execution" intention -> inention -> enention -> exention -> exection return 5
The description does not specify the type of edit distance and does not require background knowledge though, it looks like Levenshtein Distance.
See also: Types of edit distance
https://en.wikipedia.org/wiki/Edit_distance
Assumption
Possible operations are
Insert
,delete
andreplace
. It is noticed thatreplace
is considered as one step. Without it, two steps are required - first delete and replace. To allowreplace
implies that it shortens the distance, just one step of the operation is enough.constraint:
0 <= word1.length, word2.length <= 500
Approach
Tabular, bottom up DP. Break down the problem into a small problem. Solve the smallest problem in order and record it into a 2D array (matrix). The values in the cell stores the distance between the substrings of word1 and word2. It allows us to reuse the previous step to get another solution for a longer substring.
Tabulation:
As an example, I've referenced a python code suggested by Tarun Nayak(N7_BLACKHAT) and a little bit of giving modification just on the coding style.
Performance
The solution beats around 30% to 50% different from time to time, it may not an optimal solution. I also had a concern about the dp table,
cost
. Considering the string length0 <= word1.length, word2.length <= 500
, strings from other test cases can be much longer than the one in the example. It could be much longer up to 500. In worst cases, it will require 501 x 501 = 251,001 elements. Python uses 28 bytes for an integer, 28 bytes x 251,001 elements = 7,028,028 bytes. Is that amount of auxiliary space required all the way through?Solution
I've look through discussions and found a comment from a user, Maksim Shekhunov.
Based on his suggestion and the example, I've rewritten the code by replacing the
cost
(m * n matrix) to two lists,prev
andcurr
. It performed way better not only its memory space use but also in computation time.Complexity
Full Tabular:
Optimized:
Conclusion
"You don't need to store all dp array." - Maksim Shekhunov