python / cpython

The Python programming language
https://www.python.org/
Other
59.98k stars 29.02k forks source link

difflib #49139

Closed ed93c279-aa10-4fac-bd70-e044d05f4e2d closed 15 years ago

ed93c279-aa10-4fac-bd70-e044d05f4e2d commented 15 years ago
BPO 4889
Nosy @amauryfa, @jackdied
Files
  • c1.ios: This file contains above mentiones two strings. Save them in two different files and then take the diff of files.
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = None closed_at = created_at = labels = ['invalid', 'type-bug', 'library'] title = 'difflib' updated_at = user = 'https://bugs.python.org/pratikpotnis' ``` bugs.python.org fields: ```python activity = actor = 'jackdied' assignee = 'none' closed = True closed_date = closer = 'jackdied' components = ['Library (Lib)'] creation = creator = 'pratik.potnis' dependencies = [] files = ['12659'] hgrepos = [] issue_num = 4889 keywords = [] message_count = 4.0 messages = ['79455', '79457', '79721', '84224'] nosy_count = 4.0 nosy_names = ['amaury.forgeotdarc', 'ggenellina', 'jackdied', 'pratik.potnis'] pr_nums = [] priority = 'normal' resolution = 'not a bug' stage = None status = 'closed' superseder = None type = 'behavior' url = 'https://bugs.python.org/issue4889' versions = ['Python 2.5'] ```

    ed93c279-aa10-4fac-bd70-e044d05f4e2d commented 15 years ago

    While using function HtmlDiff() from Library difflib, if there is difference in caps of two strings it does not provide proper diff results. Two strings in two different files in this context that I used are: hostname vaijain123 and (this string is in small caps) hostname CAVANC1001CR1 (This one is in large caps)

    Expected behavior after diffing : It should show hostname changed (and highlight it with Yellow color)

    instead of this it is showing Added in one file and deleted in another file. (Highlighting them with green and red color respectively)

    When tried with same caps (either small or large) it shows expected behavior(highlighting the strings in yellow color). Also with numbers it works well.

    I think its an issue with the CAPS of letters. difflib is not able to differentiate between the caps of letters.

    amauryfa commented 15 years ago

    Can you be more precise? I tried to reproduce your problem, but I only get added/deleted chunks, nothing in yellow.

    Please include a script that shows what you did, and the result you expected.

    1fd7a44c-f7f2-43ed-9c9f-bafa512b8598 commented 15 years ago

    You (as a human) most likely parse these lines:

    hostname vaijain123 hostname CAVANC1001CR1

    as "two words, the first one is the same, the second word changed". But difflib sees them more or less as: "21 letters, 8 of them are the same, 13 are different". There are many more differences than matches, so it makes sense to show the changes as a complete replacement:

    >>> d = difflib.ndiff(["hostname vaijain123\n"], ["hostname 
    CAVANC1001CR1\n"])
    >>> print ''.join(d)
    - hostname vaijain123
    + hostname CAVANC1001CR1

    It has nothing to do with upper or lower case letters ("A" and "a" are completely different things for difflib). If the names were shorter, it might consider a match:

    >>> d = difflib.ndiff(["hostname vai\n"], ["hostname CAV\n"])
    >>> print ''.join(d)
    - hostname vai
    ?          ^^^
    + hostname CAV
    ?          ^^^

    Note how the ratio changes:

    >>> difflib.SequenceMatcher(None, "hostname vaijain123", "hostname 
    CAVANC1001CR1").ratio()
    0.48780487804878048
    >>> difflib.SequenceMatcher(None, "hostname vai", "hostname CAV").ratio
    ()
    0.75

    The ratio must be 0.75 or higher for a differ to consider two lines "close enough" to show intra-line differences.

    jackdied commented 15 years ago

    closing, Garbriel's explanation is sufficient.