How to get char positions of changed text in the original text

google / diff-match-patch

Diff Match Patch is a high-performance library in multiple languages that manipulates plain text.

Apache License 2.0

7.42k stars 1.1k forks source link

import diff_match_patch as dmp_module dmp = dmp_module.diff_match_patch() diff = dmp.diff_main("Hello World.", "Goodbye World.") # Result: [(-1, "Hell"), (1, "G"), (0, "o"), (1, "odbye"), (0, " World.")] #Desired Result:[(-1,"Hell",0,3,x,x),(0,"o",4,4,1,1),(1,"odbye",x,x,2,6),(0," World",6,11,7,13)]

@sathk882 can you explain what each value in your example tuples mean? It's not clear what you are asking for. Are you asking for the ranges in the original text and also in the final text of where the characters in the current diff are to be found?

If that's the case then we really just need to think about what we do with each chunk and how it impacts a running pointer into the source strings.

d = [(-1, "Hell"), (1, "G"), (0, "o"), (1, "odbye"), (0, " World.")]

def step( last_a, last_b, next ):
    op, s = next
    l = len(s)

    if -1 == op:
        return ((op, s, (last_a, last_a + l), None), (last_a + l, last_b))

    if 0 == op:
        return ((op, s, (last_a, last_a + l), (last_b, last_b + l)), (last_a + l, last_b+l))

    if 1 == op:
        return ((op, s, None, (last_b, last_b + l)), (last_a, last_b+l))

o = []
last_a = 0
last_b = 0
for chunk in d:
    next, (next_a, next_b) = step(last_a, last_b, chunk)
    last_a = next_a
    last_b = next_b
    o.append(next)

print(o)

[(-1, 'Hell', (0, 4), None), (1, 'G', None, (0, 1)), (0, 'o', (4, 5), (1, 2)), (1, 'odbye', None, (2, 7)), (0, ' World.', (5, 12), (7, 14))]

i think this will fail in case of repeated substring

Can you explain this more?

google / diff-match-patch

How to get char positions of changed text in the original text #51