rtfpessoa / diff2html

Pretty diff to html javascript library (diff2html)
https://diff2html.xyz
MIT License
2.94k stars 282 forks source link

Takes hours for finish a 300kb diff file #67

Closed rtfpessoa closed 8 years ago

rtfpessoa commented 8 years ago

MOVED FROM diff2html-cli#17

HI, I really love the tool and currently running it under windows. however my git diff file is around 300KB, the tool takes 3 hours to finish , without any output file (am using -F option). memory usage is around 800MB.
Just wondering if you have encountered the same issue before?

Tried diffy.org without no issues at all.
https://diffy.org/diff/4wng00ndqz7iudi

thanks. Travis diffReport.txt

rtfpessoa commented 8 years ago

@escitalopram I did some debugging and the Rematch.distance(amod, bmod) algorithm is taking too long, and maybe getting an infinite loop or something. Do you have any idea?

escitalopram commented 8 years ago

I'll have a look

escitalopram commented 8 years ago

The problem seems to be triggered by large blocks of changes, like OASIS.csproj having 2,2k lines added and removed in one block. The algorithm is O(nm) time with n lines added and m lines removed in a single block, starting almost 5 million levenshtein distance calculations, which are in turn O(op) time with o,p being the line lengths. I'd suggest we'll just disable the line matching on blocks larger than say n*m=2500 (and maybe make that limit configurable).

The memory hunger will probably go away with that, too, because there is some cache for distance function results. If that isn't enough, maybe I could also introduce some hash function for the cache keys.

rtfpessoa commented 8 years ago

I think that is a great idea. Can you make a PR?

escitalopram commented 8 years ago

Which branch should I base it on?

rtfpessoa commented 8 years ago

master

rtfpessoa commented 8 years ago

Fixed by #68 in release 2.0.0-beta10