ztane / python-Levenshtein

The Levenshtein Python C extension module contains functions for fast computation of Levenshtein distance and string similarity
GNU General Public License v2.0
1.26k stars 156 forks source link

Q: edit distance between 2 lists #34

Open yasheshgaur opened 6 years ago

yasheshgaur commented 6 years ago

Can some one please tell me the API to calculate the editdistance for 2 lists? I only see the API for 2 strings and in my use case, it is not possible to convert the list into strings.

rljacobson commented 5 years ago

I am not sure your problem is well defined yet. Are you wanting to compare the elements of your list componentwise and then report the sum? What would an example look like?

kkawabat commented 5 years ago

@rljacobson I believe that is what he wants. For example sometimes I need to apply Levenshtein editops on word level rather than char level sequences in which case having the words as element to a list and passing that off to Levenshtein would be helpful.

I currently have a very janky way of doing this by mapping the unique elements of my lists onto a unique char then passing that to the Levenshtein algorithm. The problem with this method is that there is a limited number of chars but there can be an unlimited number of unique elements.

def Levenshtein_editops_list(source, target):
    unique_elements = sorted(set(source + target))
    char_list = list('abcdefghijklmnopqrstuvwxyz0123456789')
    if len(unique_elements) > len(char_list):
        raise Exception("too many elements")
    else:
        unique_element_map = {ele:char_list[i]  for i, ele in enumerate(unique_elements)}
    source_str = ''.join([unique_element_map[ele] for ele in source])
    target_str = ''.join([unique_element_map[ele] for ele in target])
    transform_list = Levenshtein.editops(source_str, target_str)
    return transform_list

target = 'The Cat ate the moon'.split()
source = "The dog jumped over the moon".split()
print(Levenshtein_editop_list(source, target))

output [('delete', 1, 1), ('replace', 2, 1), ('replace', 3, 2)]

MrShininnnnn commented 4 years ago

@kkawabat Thanks for your function. I made a little bit change and it then works perfectly in my case.

def levenshtein_editops_list(source, target):
    unique_elements = sorted(set(source + target)) 
    char_list = [chr(i) for i in range(len(unique_elements))]
    if len(unique_elements) > len(char_list):
        raise Exception("too many elements")
    else:
        unique_element_map = {ele:char_list[i]  for i, ele in enumerate(unique_elements)}
    source_str = ''.join([unique_element_map[ele] for ele in source])
    target_str = ''.join([unique_element_map[ele] for ele in target])
    transform_list = Levenshtein.editops(source_str, target_str)
    return transform_list
rljacobson commented 4 years ago

Given my inability to actively maintain this repo for the foreseeable future and that a solution exists, I am inclined to close this issue—unless someone else wants to work on it.