tk0miya / diff-highlight

Apache License 2.0
34 stars 6 forks source link

Another example of missing highlight #2

Open max-arnold opened 8 years ago

max-arnold commented 8 years ago

I found another broken highlight scenario while merging fairly large markdown repository. The change looks pretty simple (a few changed words without any appended or prepended text). Reproducible with the latest revision 516f48ea3ac29593eafe29100c5ca0f76dd55b03. Anonymized sample is below:

diff --git a/1.md b/1.md
--- a/1.md
+++ b/1.md
@@ -1,1 +1,1 @@
-Aa aaa'a aaaa aa aaaaa aaaa aaaaaaaa. Aaa'a aaaaaa aaaa aaaa aaaa aaa'a aa aaaaa aaa aaa aa aaa aaaaa. Aa'a aaaaaaaaaaaa aa aaaaaa aaaa aaaa aaaa aa aaa AAAA aaaaaaa aa aaaaaaa aaaaa aaaa, aa aa aaaaa aaaa aaaa aa aa a aaaa aaaa aa aaaaaa. Aa aaaaa aaa aaaa aa aaaa aaaa aaaaaaaa aaa aaaa aaaaaaaaaaaa aaaaa aaa aaaaaaaaaaaa aaa aaaaaaaaaa. Aaaaa aa aaaaaa a aaaa aaaa aa aaaa. Aaa aaaa aa aaaa aaaa aaaaaa aaa aaaaaaaaaa. Aaa _aaaa_ aa aaaa aaaa aaaaaa aaa aaaaaaaaaa. Aaa aaa aaaa aa aaaaaaaaaaaaaa (aa aaaaa aaaaaa aa aaaaa AAA aaaaaaaaaaa aa) aaa aa AAAA aaaa aaa aaaaaaaaaaaaa aa aaaaaaaaaaaa aa aaaa aa aaaaaaaaa aaa aaaaaaaaa aa aaaaaa aaa aaaaaa.
+Aa aaa'a aaaa aa aaaaa aaaa aaaaaaaa. Aaa'a aaaaaa aaaa aaaa aaaa aaa'a aa aaaaa aaa aaa aa aaa aaaaa. Aa'a aaaaaaaaaaaa aa aaaaaa aaaa aaaa aaaa aa aaa _aaaa_ aaaaaaa aa aaaaaaa aaaaa aaaa, aa aa aaaaa aaaa aaaa aa aa a aaaa aaaa aa aaaaaa. Aa aaaaa aaa aaaa aa aaaa aaaa aaaaaaaa aaa aaaa aaaaaaaaaaaa aaaaa aaa aaaaaaaaaaaa aaa aaaaaaaaaa. Aaaaa aa aaaaaa a aaaa aaaa aa aaaa. Aaa aaaa aa aaaa aaaa aaaaaa aaa aaaaaaaaaa. Aaa _aaaa_ aa aaaa aaaa aaaaaa aaa aaaaaaaaaa. Aaa aaa aaaa aa aaaaaaaaaaaaaa (aa aaaaa aaaaaa aa aaaaa _aaa_ aaaaaaaaaaa aa) aaa aa _aaaa_ aaaa aaa aaaaaaaaaaaaa aa aaaaaaaaaaaa aa aaaa aa aaaaaaaaa aaa aaaaaaaaa aa aaaaaa aaa aaaaaa.
max-arnold commented 8 years ago

I tried to construct similar synthetic example, but this one works just fine:

diff --git a/1.md b/1.md
--- a/1.md
+++ b/1.md
@@ -1,1 +1,1 @@
-Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy TEXT ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It WAS popularised in the 1960s with the release of Letraset SHEETS containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PAGEMAKER including versions of Lorem Ipsum.
+Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy _text_ ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It _was_ popularised in the 1960s with the release of Letraset _sheets_ containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus _PageMaker_ including versions of Lorem Ipsum.
sbraz commented 6 years ago

@max-arnold You need to pass autojunk=False to your SequenceMatcher. I just tried it and it works as expected. I guess this will hinder performance though.