python / typeshed

Collection of library stubs for Python, with static types
Other
4.39k stars 1.77k forks source link

Match.group can return None #3902

Closed rmorshea closed 3 years ago

rmorshea commented 4 years ago

All usages of Match.group appear to return strings however, for optional named groups it can return None:

import re
assert re.match("(?P<numbers>[1-9]+)?", "abc").group("numbers") is None
srittau commented 4 years ago

There is a note in the stubs:

https://github.com/python/typeshed/blob/23e380ac83e1f8287a346263375437a605ab98fc/stdlib/3/typing.pyi#L549

This needs to be amended about it being potentially None in case of strings. Overloads and literals should be able to narrow this down. PR welcome.

djb4ai commented 4 years ago

@rmorshea Are you working on it? I can submit a patch if you want

djb4ai commented 4 years ago

@srittau When I am trying to replicate it since return type of re.match is Optional[Match[AnyStr]], mypy complains about calling .group on None is there any way to handle this scenario?

srittau commented 4 years ago

@lladhibhutall This should use overloads, e.g. the return type should depend on whether an argument is given, it is 0, another int or a str.

hauntsaninja commented 4 years ago

Note it can return None for (non-zero) numbered groups as well: assert re.match("(x)?y", "y").group(1) is None. This change, while correct, might hurt ergonomics in the happy case, pretty much all uses of numbered groups I've seen assume if there's a match their groups will be not None.

rmorshea commented 4 years ago

IMO having a false positive (i.e. the return type is Optional[str] for required match groups) is better than a false negative, but if the maintainers want to keep as-is to avoid inconvenience I'd understand that as well.

hauntsaninja commented 4 years ago

typeshed generally prefers false negatives over false positives (https://github.com/python/typeshed/blob/master/CONTRIBUTING.md#what-to-include), but maybe Sebastian has an opinion on whether to make Optional the return type of zero, one or both of named and (non-zero) numbered groups.

hauntsaninja commented 4 years ago

It looks like the change was made, and then reverted in https://github.com/python/typeshed/pull/3190

rmorshea commented 4 years ago

Ok, seems like false-positive caused problems. Thanks @hauntsaninja.

hauntsaninja commented 4 years ago

There's also some hope a plugin could improve results here, e.g.: https://github.com/python/mypy/pull/7803

mr-c commented 4 years ago

This is quite needed for mypyc users. Can this issue be re-opened?

srittau commented 4 years ago

Reopening. I believe we should strive towards correctness here and we can ensure this at least somewhat using overloads and literals.

hauntsaninja commented 4 years ago

An ideal solution for mypy users here is a plugin (https://github.com/python/mypy/pull/7803).

The no argument case and the Literal[0] case should return str; this should be uncontroversial. The multiple argument case should return a tuple of whatever we decide the single argument case should return.

However everything else, which comprises most of the usage of group, can potentially return None. If you assume that non-optional capture groups are more common than optional capture groups, this is potentially a lot of false positives.

As discussed in https://github.com/mypyc/mypyc/issues/676, one option here is to mark the return type as Union[Any, str]. This would sidestep the false positive problem, and since mypyc was mentioned, fix things for mypyc users. This option looks increasingly good to me!

Anecdotally, I literally wrote code today that would trigger a false positive if we instead went with Optional[str]. But a quick scan of https://grep.app/search?q=.group%28&filter[lang][0]=Python seems to indicate at least half of all usage of group would get flagged incorrectly.

KotlinIsland commented 10 months ago

Basedmypy works correctly in these cases:

import re

s: str
if m := re.match("a(b)?(c)", s):
    reveal_type(m.groups())  #  (str | None, str)