Open needadiff opened 6 months ago
The output looks correct to me. In your case the third line in table 'A' is almost the same as the second line in table B
(except for the ID) so displaying it this way the diff is probably smaller e.g:
- line 2
- line 3
+ line 2
+ line 3
vs
-line 2
my_id: ID5678, attribute: abcde ...
+ line 3
The output looks correct to me. In your case the third line in table 'A' is almost the same as the second line in table
B
(except for the ID) so displaying it this way the diff is probably smaller e.g:
I understand what you are saying, but the output is not correct. Here is a better example to demonstrate:
The first table is displayed correctly. There are two rows for each table, one for each string. The string in row 1, table 1A is "abcf", while the string in row 1, table 1B is "abcef". The difference between the two is the character "e", which is highlighted in green to demonstrate that this character was added.
The second table is not displayed correctly. In this case, we are comparing two strings "abcd" and "abcde". Similar to the first table, this should have two rows, where both "abcd" and "abcde" are lined up together on row1. The character "e" should be highlighted in green to show that the single character was added between the two strings. Instead, there is a blank column added to the table2A, and the entirety of the string "abcde" in table2B is highlighted in green, as if to show that the difference between a blank string and "abcde" is the entirety of the string "abcde".
I have been looking through the sourcecode of difflib to find a solution without luck. I am thinking this is a bug with the library when the list of strings provided too closely mirror each other. Thoughts?
I think this is because difflib does a line-by-line comparison to find matching lines first before determining how to handle each line. For example, in this code snippet:
from difflib import Differ
lines1 = ['abcdf', 'abc', 'abcde']
lines2 = ['abcde', 'abx', 'abcdg']
print(list(Differ().compare(lines1, lines2)))
The output is:
['- abcdf', '- abc', ' abcde', '+ abx', '+ abcdg']
Within the compare method of the Differ class, a SequenceMatcher object called cruncher is used to find common lines. Because the 'abcde' is present in both lines1 and lines2, cruncher will suggest to delete the first two lines of line1, and add the last two lines of line2.
There is, however, no guarantee that the difference generated will have the fewest number of characters.
@needadiff Any widely used diff algorithm first tries to find exactly matching lines, rather than finding intraline differences. Maybe trying online diff tools could convince you.
https://www.diffchecker.com/text-compare/
Or give git
a try.
Please close this issue if you find this agreeable.
Bug report
Bug description:
I am working with the python library difflib and specifically the class HtmlDiff. For some reason, the function make_table is adding a blank column in when generating an HTML table, throwing off difference highlights and defeating the entire purpose of the diff function alltogether.
Input python function:
Output HTML file with problematic data:
The problematic line in the HTML is:
Where there is an extra
<td ></td>
at the end, adding an extra column.This is what the table looks like with the unwanted column
This is what the table is supposed to look like when the columns are aligned correctly
I have tried changing the length of the string, checking for invisible characters, removing colons. There is something wrong with the strings that I am providing as input which throws off the make_table function. I have provided make_table with longer strings and the output was just fine. The behavior is very inconsistent.
Thanks for the help.
CPython versions tested on:
CPython main branch
Operating systems tested on:
Linux