Length of longest common subsequence of m, n

xxleyi commented 5 years ago

⚠️是 subsequence, 而不是 substring

这个问题之前研究过，但那时候是对着答案理解梳理的。本次独立进行一遍全部过程，遇到问题就克服，然后记下来，作为自己的薄弱环节。

m = "abcliuxm"
n = "liuximin"
get_max_lcs_len(m, n) => 5
get_max_lcs_len(m, n*1000) => 5

这个问题是典型的动态规划，之前我自己总结过一个解决动态规划问题的基本过程：

写出递归解
分析递归解时间复杂度，很可能是指数级别
找出子函数重复调用部分，写出带缓存的递归解
代码在此时可能略显繁琐，但好在时间复杂度应该已经得到极大改善
分析缓存的数据结构，逆向还原迭代解

# 无缓存递归，很简洁，但可惜复杂度堪忧
def get_max_lcs_len(m, n):
    def _get_max(i_m, i_n):
        if i_m == -1 or i_n == -1:
            return 0
        if m[i_m] == n[i_n]:
            return 1 + _get_max(i_m - 1, i_n - 1)
        else:
            return max(_get_max(i_m - 1, i_n), _get_max(i_m, i_n - 1))

    return _get_max(len(m) - 1, len(n) - 1)

# 带缓存递归版，代码略显繁琐，但性能已经够用
def get_max_lcs_len(m, n):
    cache = {}

    def _get_max(i_m, i_n):
        if (i_m, i_n) in cache:
            return cache[(i_m, i_n)]

        if i_m == -1 or i_n == -1:
            res = 0
        elif m[i_m] == n[i_n]:
            res = 1 + _get_max(i_m - 1, i_n - 1)
        else:
            res = max(_get_max(i_m - 1, i_n), _get_max(i_m, i_n - 1))
        cache[(i_m, i_n)] = res

        return res

    return _get_max(len(m) - 1, len(n) - 1)

# 终极迭代版，至拙至巧，代码和思路之间的 gap 略大，很难一下子就想到迭代解
def get_max_lcs_len_it(m, n):
    t = [[0] * (len(n) + 1) for _ in range(len(m) + 1)]
    i = j = 0
    for i in range(1, len(m) + 1):
        for j in range(1, len(n) + 1):
            if m[i - 1] == n[j - 1]:
                t[i][j] = t[i - 1][j - 1] + 1
            else:
                t[i][j] = max(t[i - 1][j], t[i][j - 1])
    return t[i][j]

# 出于一致性考虑，递归解改写一下形式
def get_max_lcs_len(m, n):
    def _get_max(i, j):
        if i == 0 or j == 0:
            return 0
        if m[i - 1] == n[j - 1]:
            return 1 + _get_max(i - 1, j - 1)
        else:
            return max(_get_max(i - 1, j), _get_max(i, j - 1))

    return _get_max(len(m), len(n))

怎么样，是不是出奇的不可思议的一致。

xxleyi commented 5 years ago

暴露出来的薄弱环节主要是二维数组使用比较生疏，不能信手拈来。

xxleyi commented 5 years ago

带缓存版本的优化

其实手写带缓存版本的逻辑基本是一致且冗余的，所以完全可以使用装饰器来一次性解决这个问题。更进一步的，Python 中其实已经提供了一个更好的装饰器：functools.lru_cache:

lru_cache?
Signature: lru_cache(maxsize=128, typed=False)
Docstring:
Least-recently-used cache decorator.

If *maxsize* is set to None, the LRU features are disabled and the cache
can grow without bound.

If *typed* is True, arguments of different types will be cached separately.
For example, f(3.0) and f(3) will be treated as distinct calls with
distinct results.

Arguments to the cached function must be hashable.

View the cache statistics named tuple (hits, misses, maxsize, currsize)
with f.cache_info().  Clear the cache and statistics with f.cache_clear().
Access the underlying function with f.__wrapped__.

See:  http://en.wikipedia.org/wiki/Cache_algorithms#Least_Recently_Used
File:      /usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/functools.py
Type:      function

# 优化之后的带缓存递归版本
import functools
cache = functools.lru_cache(maxsize=None)

def get_max_lcs_len(m, n):
    @cache
    def _get_max(i, j):
        if i == 0 or j == 0:
            return 0
        if m[i - 1] == n[j - 1]:
            return 1 + _get_max(i - 1, j - 1)
        return max(_get_max(i - 1, j), _get_max(i, j - 1))

    return _get_max(len(m), len(n))

怎么样，是不是超级酷？Python 中写递归还是蛮舒服的。

xxleyi / learning_list

Length of longest common subsequence of m, n #168

带缓存版本的优化