Open max-arnold opened 8 years ago
I tried to construct similar synthetic example, but this one works just fine:
diff --git a/1.md b/1.md
--- a/1.md
+++ b/1.md
@@ -1,1 +1,1 @@
-Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy TEXT ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It WAS popularised in the 1960s with the release of Letraset SHEETS containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PAGEMAKER including versions of Lorem Ipsum.
+Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy _text_ ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It _was_ popularised in the 1960s with the release of Letraset _sheets_ containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus _PageMaker_ including versions of Lorem Ipsum.
@max-arnold You need to pass autojunk=False
to your SequenceMatcher
. I just tried it and it works as expected. I guess this will hinder performance though.
I found another broken highlight scenario while merging fairly large markdown repository. The change looks pretty simple (a few changed words without any appended or prepended text). Reproducible with the latest revision 516f48ea3ac29593eafe29100c5ca0f76dd55b03. Anonymized sample is below: