hauntsaninja / mypy_primer

Run mypy and pyright over millions of lines of code
MIT License
57 stars 30 forks source link

feat: improve diff formatting #12

Closed atakiar closed 3 years ago

atakiar commented 3 years ago

Description

The goal of this PR is to improve the formatting of the diff reported - especially when it comes to multiline notes and errors. See python/mypy#10572 for an example of the current behavior

Issue

Expected diff:

+ aioredis/client.py:4261: note:      Superclass:
+ aioredis/client.py:4261: note:          def (self: Redis) -> Any
+ aioredis/client.py:4261: note:      Subclass:
+ aioredis/client.py:4261: note:          def __aenter__(self) -> Coroutine[Any, Any, Pipeline]

Actual diff:

+ aioredis/client.py:4257: note:          def (self: Redis) -> Any
+ aioredis/client.py:4257: note:          def __aenter__(self) -> Coroutine[Any, Any, Pipeline]
+ aioredis/client.py:4257: note:      Subclass:
+ aioredis/client.py:4257: note:      Superclass:

Potential Solution

To get the expected diff ordering, we can simply do away with sorting errors from the same line. However, this may result in a diff that indicates the same line was both added and removed, e.g.

- a.py:1: error ...
+ a.py:1: error ...

To handle this new issue we can filter these duplicates out by first identifying the exact type and number of removals and additions that are taking place

We then subtract these counters from one another simultaneously and get an appropriate count of removals and additions that must be present in our final diff

Finally, we iterate over the diff's lines, filtering with the counters, updating as we go, and report a diff in a more readable order

Let me know what you think!

atakiar commented 3 years ago

Thanks for the quick review! I've made some changes accordingly, let me know what you think!

  1. Good catch, I've removed the itertools.groupby
  2. Yeah that's a lot cleaner, I've updated the code to use the net_change instead of the counters! However it's still missing something. Consider the following:
    old = """\
    test.py:1: error: First
    test.py:1: error: Second
    test.py:1: error: Second
    """
    new = """\
    test.py:1: error: Second
    test.py:1: error: First
    """

    The net_change would be defaultdict(<class 'int'>, {'test.py:1: error: Second': -1}) and we would print out:

    + test.py:1: error: Second
    - test.py:1: error: Second
    - test.py:1: error: Second

    So, we need to handle both the sign and magnitude of the net_change, e.g. -1 means print out just 1 removal of that line. Also, once a line is processed, the net_change would need to be updated so we don't print it out a second time

hauntsaninja commented 3 years ago

Thanks again!