goungoun / leetcode

Keep Going!
https://leetcode.com/u/gounna/
0 stars 0 forks source link

72. Edit Distance #49

Closed goungoun closed 1 month ago

goungoun commented 1 month ago

Intuition

The given example shows how to transform word1 to make word2. It is about mutation from one state to another. The goal is to count a series of transformations.

Example 1: "horse" to "ros" horse -> rose -> ros return 3

Example 2: "intention" to "execution" intention -> inention -> enention -> exention -> exection return 5

The description does not specify the type of edit distance and does not require background knowledge though, it looks like Levenshtein Distance.

See also: Types of edit distance

https://en.wikipedia.org/wiki/Edit_distance

Assumption

Possible operations are Insert, delete and replace. It is noticed that replace is considered as one step. Without it, two steps are required - first delete and replace. To allow replace implies that it shortens the distance, just one step of the operation is enough.

constraint:

Approach

Tabular, bottom up DP. Break down the problem into a small problem. Solve the smallest problem in order and record it into a 2D array (matrix). The values in the cell stores the distance between the substrings of word1 and word2. It allows us to reuse the previous step to get another solution for a longer substring.

Tabulation:

word1="horse", word2="ros"
        r o s
      0 1 2 3
    h 1 1 2 3 
    o 2 2 1 2
    r 3 2 2 2
    s 4 3 3 2
    e 5 4 4 3

As an example, I've referenced a python code suggested by Tarun Nayak(N7_BLACKHAT) and a little bit of giving modification just on the coding style.

class Solution:
    def minDistance(self, word1: str, word2: str) -> int:
        if not word1 and not word2:
            return 0

        len_word1 = len(word1) if word1 else 0
        len_word2 = len(word2) if word2 else 0

        # word1: row, word2: col
        cost = [[0] * (len_word2 + 1) for _ in range(len_word1 + 1)]

        # first column: The cost of change without word2
        for r in range(1, len_word1+1):
            cost[r][0] = r

        # first row: The cost of change without word1
        for c in range(1, len_word2+1):
            cost[0][c] = c

        for r in range(1, len_word1+1):
            for c in range(1, len_word2+1):
                if word1[r-1] == word2[c-1]:
                    cost[r][c] = cost[r-1][c-1]
                else:
                    cost[r][c] = min(
                        cost[r-1][c],   # Insert
                        cost[r][c-1],   # Delete
                        cost[r-1][c-1]  # Replace
                    ) + 1

        return cost[len_word1][len_word2]

Performance

The solution beats around 30% to 50% different from time to time, it may not an optimal solution. I also had a concern about the dp table, cost. Considering the string length 0 <= word1.length, word2.length <= 500, strings from other test cases can be much longer than the one in the example. It could be much longer up to 500. In worst cases, it will require 501 x 501 = 251,001 elements. Python uses 28 bytes for an integer, 28 bytes x 251,001 elements = 7,028,028 bytes. Is that amount of auxiliary space required all the way through?

Solution

I've look through discussions and found a comment from a user, Maksim Shekhunov.

You don't need to store all dp array.

Based on his suggestion and the example, I've rewritten the code by replacing the cost (m * n matrix) to two lists, prev and curr. It performed way better not only its memory space use but also in computation time.

    def minDistance(self, word1: str, word2: str) -> int:
        if not word1 and not word2:
            return 0

        len_word1 = len(word1) if word1 else 0
        len_word2 = len(word2) if word2 else 0

        prev = list(range(len_word2+1))
        cur = [0] * (len_word2 + 1)

        for r in range(1, len_word1+1):
            cur[0] = r
            for c in range(1, len_word2+1):
                if word1[r-1] == word2[c-1]:
                    cur[c] = prev[c-1]
                else:
                    cur[c] = min(
                        prev[c],   # Insert
                        cur[c-1],  # Delete
                        prev[c-1]  # Replace
                    ) + 1

            prev = cur
            cur = [0] * (len_word2 + 1)

        return prev[-1]

Complexity

Full Tabular:

Optimized:

Conclusion

"You don't need to store all dp array." - Maksim Shekhunov

goungoun commented 1 month ago

Merged to the main branch: https://github.com/goungoun/leetcode/blob/main/DynamicProgramming/edit_distance.py