Open mookid opened 5 years ago
I'm hitting diffs where git add -p
can take hours, and it took a while to realize diffr was the cause— i had gotten used to beautiful, useful diffs and forgotten i was not on git standard.
One thing i do not understand is why it has seemingly suddenly gotten much worse for me. As in, in just the last week, went from never thinking about it (effectively instant) to being completely unusable, across multiple projects of the kind i always work with. Some underlying tool that might have gotten an update?
Anyhow strong vote for some way to detect and bail on long lines. Maybe something configurable, and to not try to diff lines longer than 500 characters by default.
sorry about the bad experience. of course, if you have a reproduction case, I would be very happy to have a look.
An relatively easy way to proceed would be to periodically check for maximal processing time for a given hunk, and bail if necessary (and fallback to the default diff processing)
I will most likely no do the coding myself (my company is not very supportive of open source at all) but I can provide some guidance if anyone would want try to tackle it
Part of a reason for the slowness might be a massive amount of generating excess escape sequences. See, e.g. this two lines change looks like this with the escape sequences:
Clearly, most of them are unneeded.
I think that the slow cases are bad cases of the Myers algo, which is quadratic in number of tokens if the before/after are very different. I don't think that the added control chars are enough to make it slower by itself.
I experience the issue with VERY large diffs. 500 lines was fine but 1k+ it becomes a problem.
Here's a script to reproduce the issue
tr -dc a-z1-4 </dev/urandom |
tr 1-2 ' \n' |
awk 'length==0 || length>50' |
tr 3-4 ' ' |
sed 's/^ *//' |
cat -s |
sed 's/ / /g' |
fmt |
head -n 2000 > /tmp/random-1.txt &&
tr -dc a-z1-4 </dev/urandom |
tr 1-2 ' \n' |
awk 'length==0 || length>50' |
tr 3-4 ' ' |
sed 's/^ *//' |
cat -s |
sed 's/ / /g' |
fmt |
head -n 2000 > /tmp/random-2.txt &&
git diff --no-index /tmp/random-1.txt /tmp/random-2.txt |
diffr
hi @CervEdin
thanks for the report! I have a patch for that, which is very simple. unfortunately, my company does not really allow open source work, so this is stuck there for a while.
Hi @mookid
That's great! However, I'm a bit perplexed. Aren't you the maintainer?
I am. I can provide some help to someone motivated to write the code, but can't submit it myself.
The Myers algorithm is incremental: if find the shortest path of length D for each D increasing; my patch just stops after D large-ish (if D is large, the diff will have poor readability annyway) and just falls back to "remove everything, add everything".
Fair enough. Maybe I take a look at some point. For now I just switched back the pager of things like git log to diff-highlight
On Thu, Apr 20, 2023, 4:17 AM Nathan Moreau @.***> wrote:
I am. I can provide some help to someone motivated to write the code, but can't submit it myself.
The Myers algorithm is incremental: if find the shortest path of length D for each D increasing; my patch just stops after D large-ish (if D is large, the diff will have poor readability annyway) and just falls back to "remove everything, add everything".
— Reply to this email directly, view it on GitHub https://github.com/mookid/diffr/issues/14#issuecomment-1515616491, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAXL5ESXXHC5NYXPZZO7DE3XCCMCLANCNFSM4IFNDB6Q . You are receiving this because you were mentioned.Message ID: @.***>
@CervEdin #96 should help with your problem, probably (your script does not work on my system for some reason, I get tr: Illegal byte sequence
when running it)
@mookid thanks! Maybe the test script I wrote is using something non POSIX. It works in WSL and git-bash.
I was able to test that 9bfc9f264739c2324d7fcd1f66038d06b67ff116
allows using --large-diff-threshold 200
, to skip. But I don't think I saw a mention of the flag in --help
.
I'm wasn't able to cargo install --path .
in master
, geting issues with is_terminal
.
error[E0658]: use of unstable library feature 'is_terminal'
--> src/cli_args.rs:7:5
|
7 | use std::io::IsTerminal;
| ^^^^^^^^^^^^^^^^^^^
|
= note: see issue #98070 <https://github.com/rust-lang/rust/issues/98070> for more information
Thank you for adding this feature, it's going to make the tool much more usable, at least for me ♥
I was able to test that
9bfc9f264739c2324d7fcd1f66038d06b67ff116
allows using--large-diff-threshold 200
, to skip. But I don't think I saw a mention of the flag in--help
.no, I made it a 'private' flag for now
I'm wasn't able to
cargo install --path .
inmaster
, geting issues withis_terminal
.
you need a recent-enough version of the rust toolchain, as is_terminal has been stabilised relatively recently; or use the unstable version: cargo +nightly install --path .
probably works (not tested)
Thank you for adding this feature, it's going to make the tool much more usable, at least for me ♥
<3
See https://scripter.co/git-diff-minified-js-and-css/
On the repository https://github.com/gohugoio/hugoDocs,
git log -p -- themes/gohugoioTheme/static/dist/app.bundle.js | diffr
takes ~9min on my machine.One problematic commit: 0960c5fb36e9abcf3b492a523943268e89066543
Explore possible workarounds: