oselvar / lhdiff

A Lightweight Hybrid Approach for Tracking Source Lines
MIT License
0 stars 0 forks source link

Simpler diffing #2

Open aslakhellesoy opened 9 months ago

aslakhellesoy commented 9 months ago

Currently we're using one library to generate a unified diff string, then parse it into a datastructure.

This is inefficient. We should use a lower level myers diff algorithm instead:

It should be fast and memory efficient (check the issues and benchmarks)

This one looks promising: https://github.com/thepudds/patience-diff/blob/main/diff.go

It would have to be modified to output data structures instead of a unified diff script.

Trying this as it seems most uptodate: https://pkg.go.dev/github.com/rogpeppe/go-internal/diff@master

aslakhellesoy commented 9 months ago

Maybe use a diff-match-patch implementation instead for the initial diffing. It can be used for line diffing

For a SimHash alternative, maybe consider LSH

Other articles:

aslakhellesoy commented 9 months ago

SZZ tools:

A neural variant of SZZ: https://baolingfeng.github.io/papers/ASE2023.pdf

aslakhellesoy commented 9 months ago

Maybe useful for SZZ: https://commondatastorage.googleapis.com/chrome-infra-docs/flat/depot_tools/docs/html/git-hyper-blame.html

Built into git, but maybe not go-git? https://github.com/tpope/vim-fugitive/issues/1058#issuecomment-529038519