vlasovskikh / funcparserlib

Recursive descent parsing library for Python based on functional combinators
https://funcparserlib.pirx.ru
MIT License
338 stars 38 forks source link

Improve typing support for _TupleParser #80

Open beekill95 opened 4 months ago

beekill95 commented 4 months ago

Right now, when working with 3 parsers combined together, the resulting type of the combined parser is Any.

For instance,

from typing import reveal_type
from funcparserlib.parser import a

expr = a("x") + a("y") + a("z")
reveal_type(expr) # Parser[str, Any]

, and with 4 parsers, the resulting type is Tuple:

expr2 = a("x") + a("y") + a("z") + a("w")
reveal_type(expr2) # _TupleParser[str, Tuple[Any, str]]

This is troublesome when working with mypy for type checking. For example, the second code snippet will lead to mypy error:

def parse(args: tuple[str, str, str, str]):
    pass

expr2 = (a("x") + a("y") + a("z") + a("w")) >> parse
# Unsupported operand types for >> ("_TupleParser[str, tuple[Any, str]]" and "Callable[[tuple[str, str, str, str]], int]")

Ideally, for both examples, the resulting parsers should be _TupleParser, and the resulting types should be recognized correctly, tuple[str, str, str] and tuple[str, str, str, str] respectively.

beekill95 commented 4 months ago

I've made some changes to the code to support the above cases by utilizing TypeVarTuple (newly introduced for python 3.11, but we can use typing-extensions to bring the functionality to earlier python versions). Would you be interested in this? I could make a PR for this.

vlasovskikh commented 4 months ago

@beekill95 Thanks for your idea! I managed to cover cases for 2-tuples with types, giving up on tuples of 3+ elements. The idea was to start small with types here. This issue for 4-tuples (and apparently on 6-tuples, 8-tuples, etc.) wasn't something I had unit tests for.

I'm open to improvements in typing for more complex cases, but I'd like to preserve compatibility with all supported versions of Python (3.8+ as for now). Also, I'd like for funcparserlib to keep having no package dependencies (dev-dependencies are OK though).

Are there any ways to improve typing for parsers of 3+ tuples without dropping compatibility with 3.8? Maybe via a dev-dependency on typing-extenstions with imports guarded by typing.TYPE_CHECKING?

beekill95 commented 4 months ago

I don't know how the typing would behave if we just install typing-extensions as dev-dependency for users with python < 3.11. For users with mypy already installed, then typing-extensions would also be implicitly installed on their systems and the typing would work. However, for users without typing-extensions explicitly/implicitly installed, I guess we could tell users to install typing-extensions as dev-dependency as well so the typing works correctly. How do you feel about this?

Another option is to add only the relevant parts of TypeVarTuple to the codebase: https://github.com/python/typing_extensions/blob/main/src/typing_extensions.py#L2257, but I don't know how viable it would be?

For unit testing, we could utilize typing_extensions.assert_type() to write the tests.

vlasovskikh commented 4 months ago

@beekill95 I've experimented with TypeVarTuple in several combinations:

I haven't found a combination that was good for most cases for Python < 3.11.

My best shot was this Patch A:

Index: tests/test_parsing.py
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/tests/test_parsing.py b/tests/test_parsing.py
--- a/tests/test_parsing.py (revision 18c0a99dcdb427e35226c74b7cc2617223c8e1fa)
+++ b/tests/test_parsing.py (date 1714172841079)
@@ -1,7 +1,17 @@
 # -*- coding: utf-8 -*-

+import sys
 import unittest
-from typing import Optional, Tuple
+from typing import Optional, Tuple, TYPE_CHECKING
+
+if TYPE_CHECKING:
+    if sys.version_info >= (3, 11):
+        from typing import TypeVarTuple, Unpack
+    else:
+        from typing_extensions import TypeVarTuple, Unpack
+
+    # noinspection PyTypeChecker
+    Ts = TypeVarTuple("Ts")

 from funcparserlib.lexer import TokenSpec, make_tokenizer, LexerError, Token
 from funcparserlib.parser import (
@@ -19,6 +29,19 @@
 )

+def type_var_tuple_example(xs: "Tuple[Unpack[Ts]]") -> "Tuple[int, Unpack[Ts]]":
+    return (1,) + xs
+
+
+def use_type_var() -> None:
+    x: Tuple[int, str, int] = type_var_tuple_example(("foo", 1))
+    y: Tuple[int, str, int] = type_var_tuple_example(("foo", 1, 2))
+    print(x, y)
+
+    s, i = type_var_tuple_example(("foo",))
+    print(s + i)
+
+
 class ParsingTest(unittest.TestCase):
     def test_oneplus(self) -> None:
         x = a("x")
Index: .pre-commit-config.yaml
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
--- a/.pre-commit-config.yaml   (revision 18c0a99dcdb427e35226c74b7cc2617223c8e1fa)
+++ b/.pre-commit-config.yaml   (date 1714089653084)
@@ -12,6 +12,8 @@
     rev: "v1.7.0"
     hooks:
       - id: mypy
+        additional_dependencies:
+          - typing-extensions==4.11.0
   - repo: local
     hooks:
       - id: unittest

But Patch A didn't work with code inspections in PyCharm. It produced false positive errors for functions that use typing.Unpack in its return types. About 30% of Python developers use PyCharm, so it's better not to interfere with their editing experience, even though it's actually a bug in PyCharm's code inspections.

Another version was to run Mypy and take into account IDE inspections only for Python 3.11 and 3.12, this is Patch B:

Index: tests/test_parsing.py
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/tests/test_parsing.py b/tests/test_parsing.py
--- a/tests/test_parsing.py (revision 18c0a99dcdb427e35226c74b7cc2617223c8e1fa)
+++ b/tests/test_parsing.py (date 1714173330602)
@@ -1,7 +1,12 @@
 # -*- coding: utf-8 -*-

 import unittest
-from typing import Optional, Tuple
+from typing import Optional, Tuple, TYPE_CHECKING
+
+if TYPE_CHECKING:
+    from typing import TypeVarTuple
+
+    Ts = TypeVarTuple("Ts")

 from funcparserlib.lexer import TokenSpec, make_tokenizer, LexerError, Token
 from funcparserlib.parser import (
@@ -19,6 +24,19 @@
 )

+def type_var_tuple_example(xs: "Tuple[*Ts]") -> "Tuple[int, *Ts]":  # noqa: F722
+    return (1,) + xs
+
+
+def use_type_var() -> None:
+    x: Tuple[int, str, int] = type_var_tuple_example(("foo", 1))
+    y: Tuple[int, str, int] = type_var_tuple_example(("foo", 1, 2))
+    print(x, y)
+
+    s, i = type_var_tuple_example(("foo",))
+    print(s + i)
+
+
 class ParsingTest(unittest.TestCase):
     def test_oneplus(self) -> None:
         x = a("x")
Index: .pre-commit-config.yaml
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
--- a/.pre-commit-config.yaml   (revision 18c0a99dcdb427e35226c74b7cc2617223c8e1fa)
+++ b/.pre-commit-config.yaml   (date 1714173262135)
@@ -12,6 +12,7 @@
     rev: "v1.7.0"
     hooks:
       - id: mypy
+        language_version: "3.12"
   - repo: local
     hooks:
       - id: unittest

I wonder how many people may still want to use Mypy or edit their code in IDEs with Python 3.10 or older. Patch B won't work for them and they won't be able to fix things by installing typing_extensions.

At this moment I haven't come up with better options that I would be confident to introduce. I'm open to new options.

vlasovskikh commented 4 months ago

@beekill95 I'm wondering how your PR might look like... If it solves typing for tuple parsers of any length, it may convince me to drop support for 3.10 and earlier from the code editor perspective. It would still be possible to use funcparserlib with 3.8-3.10 (e.g. for anyone still using it as a dependency in production). Only actual code editing experience for 3.8-3.10 will become worse.

beekill95 commented 4 months ago

@vlasovskikh, I just created a draft PR #81. I've tested on my editors (VScode/Neovim + pyright lsp) with python 3.10, the type hinting seems correct for me:

from __future__ import annotations

from typing import reveal_type

from funcparserlib.parser import a

expr1 = a("x") + a("y") + (a("1") >> int) + (a("1.2") >> float)
reveal_type(expr1) # Type of "expr1" is "_TupleParser[str, str, str, int, float]"
reveal_type(expr1.__rshift__) # Type of "expr1.__rshift__" is "(f: (Tuple[str, str, int, float]) -> _C@__rshift__) -> Parser[str, _C@__rshift__]"

You're right about Pycharm, they're still implementing the support for TypeVarTuple (https://youtrack.jetbrains.com/issue/PY-53105/Support-PEP-646-Variadic-Generics).

Let me know how you feel about this approach.

vlasovskikh commented 4 months ago

@beekill95 I've quickly looked through the PR and noticed a few possible problems with typing. Pre-commit checks have caught these problems too.

Based on your idea about TypeVarTuple for parsers of 3+ tuples, I've experimented with TypeVarTuple for a simpler example, that models parsers pretty well, despite its simplicity. Here's what I got. It passes Mypy checks as well as unit tests. It checks (almost) every combination of adding simple values, ignored values, and tuples of values for sizes 1 to 4. 5 and more should be similar to 3 and 4. We could adopt this example as the basis for updating the __add__ for parser.

Source code of the example:

from __future__ import annotations
from dataclasses import dataclass, field
from typing import Generic, TypeVar, TypeVarTuple, assert_type, overload
from unittest import TestCase

T = TypeVar("T")
V = TypeVar("V")
Ts = TypeVarTuple("Ts")
Ks = TypeVarTuple("Ks")

@dataclass
class Ign:
    value: None = field(default=None, init=False)

    @overload
    def __add__(self, other: Ign) -> Ign:
        pass

    @overload
    def __add__(self, other: Val[T]) -> Val[T]:
        pass

    @overload
    def __add__(self, other: Tpl[*Ts]) -> Tpl[*Ts]:
        pass

    def __add__(self, other: Val[T] | Tpl[*Ts] | Ign) -> Val[T] | Tpl[*Ts] | Ign:
        if isinstance(other, Ign):
            return self
        else:
            return other

@dataclass
class Val(Generic[T]):
    value: T

    def __neg__(self) -> Ign:
        return Ign()

    @overload
    def __add__(self, other: Ign) -> Val[T]:
        pass

    @overload
    def __add__(self, other: Val[V]) -> Tpl[T, V]:
        pass

    @overload
    def __add__(self, other: Tpl[*Ts]) -> Tpl[T, tuple[*Ts]]:
        pass

    def __add__(
        self, other: Val[V] | Tpl[*Ts] | Ign
    ) -> Tpl[T, V] | Tpl[T, tuple[*Ts]] | Val[T]:
        if isinstance(other, Ign):
            return self
        else:
            return Tpl((self.value, other.value))  # type: ignore[return-value]

@dataclass
class Tpl(Generic[*Ts]):
    value: tuple[*Ts]

    @overload
    def __add__(self, other: Ign) -> Tpl[*Ts]:
        pass

    @overload
    def __add__(self, other: Val[T]) -> Tpl[*Ts, T]:
        pass

    @overload
    def __add__(self, other: Tpl[*Ks]) -> Tpl[*Ts, tuple[*Ks]]:
        pass

    def __add__(
        self, other: Val[T] | Tpl[*Ks] | Ign
    ) -> Tpl[*Ts, T] | Tpl[*Ts] | Tpl[*Ts, tuple[*Ks]]:
        if isinstance(other, Ign):
            return self
        else:
            return Tpl(self.value + (other.value,))  # type: ignore[return-value]

class TypeVarTupleTest(TestCase):
    def test_every_1(self) -> None:
        assert_type(Val("a"), Val[str])
        assert_type(-Val("a"), Ign)

        self.assertEqual(Val("a").value, "a")
        self.assertEqual((-Val("a")).value, None)

    def test_every_2(self) -> None:
        assert_type(-Val("a") + -Val("b"), Ign)
        assert_type(Val("a") + -Val("b"), Val[str])
        assert_type(-Val("a") + Val("b"), Val[str])
        assert_type(Val("a") + Val("b"), Tpl[str, str])

        self.assertEqual((-Val("a") + -Val("b")).value, None)
        self.assertEqual((Val("a") + -Val("b")).value, "a")
        self.assertEqual((-Val("a") + Val("b")).value, "b")
        self.assertEqual((Val("a") + Val("b")).value, ("a", "b"))

    def test_all_3_ignored_of_3(self) -> None:
        assert_type(-Val("a") + -Val("b") + -Val("c"), Ign)
        assert_type(-Val("a") + (-Val("b") + -Val("c")), Ign)

        self.assertEqual((-Val("a") + -Val("b") + -Val("c")).value, None)
        self.assertEqual((-Val("a") + (-Val("b") + -Val("c"))).value, None)

    def test_every_2_ignored_of_3(self) -> None:
        # Left-associative
        assert_type(Val("a") + -Val("b") + -Val("c"), Val[str])
        assert_type(-Val("a") + Val("b") + -Val("c"), Val[str])
        assert_type(-Val("a") + -Val("b") + Val("c"), Val[str])

        self.assertEqual((Val("a") + -Val("b") + -Val("c")).value, "a")
        self.assertEqual((-Val("a") + Val("b") + -Val("c")).value, "b")
        self.assertEqual((-Val("a") + -Val("b") + Val("c")).value, "c")

        # Right-associative
        assert_type(Val("a") + (-Val("b") + -Val("c")), Val[str])
        assert_type(-Val("a") + (Val("b") + -Val("c")), Val[str])
        assert_type(-Val("a") + (-Val("b") + Val("c")), Val[str])

        self.assertEqual((Val("a") + (-Val("b") + -Val("c"))).value, "a")
        self.assertEqual((-Val("a") + (Val("b") + -Val("c"))).value, "b")
        self.assertEqual((-Val("a") + (-Val("b") + Val("c"))).value, "c")

    def every_1_ignored_of_3(self) -> None:
        # Left-associative
        assert_type(Val("a") + Val("b") + -Val("c"), Tpl[str, str])
        assert_type(Val("a") + -Val("b") + Val("c"), Tpl[str, str])
        assert_type(-Val("a") + Val("b") + Val("c"), Tpl[str, str])

        self.assertEqual((Val("a") + Val("b") + -Val("c")).value, ("a", "b"))
        self.assertEqual((Val("a") + -Val("b") + Val("c")).value, ("a", "c"))
        self.assertEqual((-Val("a") + Val("b") + Val("c")).value, ("b", "c"))

        # Right-associative
        assert_type(Val("a") + (Val("b") + -Val("c")), Tpl[str, str])
        assert_type(Val("a") + (-Val("b") + Val("c")), Tpl[str, str])
        assert_type(-Val("a") + (Val("b") + Val("c")), Tpl[str, str])

        self.assertEqual((Val("a") + (Val("b") + -Val("c"))).value, ("a", "b"))
        self.assertEqual((Val("a") + (-Val("b") + Val("c"))).value, ("a", "c"))
        self.assertEqual((-Val("a") + (Val("b") + Val("c"))).value, ("b", "c"))

    def test_all_3(self) -> None:
        assert_type(Val("a") + Val("b") + Val("c"), Tpl[str, str, str])
        assert_type(Val("a") + (Val("b") + Val("c")), Tpl[str, tuple[str, str]])

        self.assertEqual((Val("a") + Val("b") + Val("c")).value, ("a", "b", "c"))
        self.assertEqual((Val("a") + (Val("b") + Val("c"))).value, ("a", ("b", "c")))

    def test_all_4(self) -> None:
        assert_type(
            Val("a") + Val("b") + Val("c") + Val("d"),
            Tpl[str, str, str, str],
        )
        assert_type(
            Val("a") + Val("b") + (Val("c") + Val("d")),
            Tpl[str, str, tuple[str, str]],
        )
        assert_type(
            Val("a") + (Val("b") + Val("c")) + Val("d"),
            Tpl[str, tuple[str, str], str],
        )
        assert_type(
            Val("a") + (Val("b") + Val("c") + Val("d")),
            Tpl[str, tuple[str, str, str]],
        )

        self.assertEqual(
            (Val("a") + Val("b") + Val("c") + Val("d")).value,
            ("a", "b", "c", "d"),
        )
        self.assertEqual(
            (Val("a") + Val("b") + (Val("c") + Val("d"))).value,
            ("a", "b", ("c", "d")),
        )
        self.assertEqual(
            (Val("a") + (Val("b") + Val("c")) + Val("d")).value,
            ("a", ("b", "c"), "d"),
        )
        self.assertEqual(
            (Val("a") + (Val("b") + Val("c") + Val("d"))).value,
            ("a", ("b", "c", "d")),
        )
vlasovskikh commented 4 months ago

@east825 ^ FYI this example is correct from the perspective of PEP 646. Both Mypy and Pylance are fine with it. However, in PyCharm, it results in IllegalArgumentException: Unbounded unpacked tuple type of a TypeVarTuple or another unpacked tuple type is now allowed. Sorry to bother you with this mention, I hope you don't mind, feel free to ignore it :)

beekill95 commented 4 months ago

@vlasovskikh, I've fixed some of the pre-commit errors, however, it seems that this is not as easy as I thought. Here are the problems I'm facing:

  1. I cannot put the typing_extensions imports behind TYPE_CHECKING as _TupleParser inherits from Parser[A, Tuple[Unpack[_Ts]]], which means the class depends on Unpack at runtime, not just when type checking.
  2. Incompatible return value type between Parser.__add__() and _TupleParser.__add__(). The reason being that when calling super().__add__(), the self becomes Parser[A, tuple[*_Ts]] and the return type is then _TupleParser[A, tuple[*_Ts], Any] while the return type should be _TupleParser[A, *_Ts, B]. To resolve this, I think I will have to move the logic of adding two parsers to _TupleParser.__add__() and avoid calling super().__add__() entirely.
vlasovskikh commented 3 months ago

@beekill95 Sorry for the delay in replying to your comment and your updated PR. I've looked at this problem shortly and haven't found a working solution yet. It looks like it won't be enough to make changes to type signatures. We may have to make _TupleParser public and not a child of Parser... I'm not sure about it yet. I'm afraid I have to keep it on pause from my side for now. I'll get back to it when I have more free time 😞 By the way, I still consider better typing as a possible direction, maybe for a major incompatible release with dropping compatibility with 3.10 and older.