jspahrsummers / adt

Algebraic data types for Python (experimental, not actively maintained)
MIT License
172 stars 14 forks source link

avoid dangerous designs & suggestions #18

Closed thautwarm closed 5 years ago

thautwarm commented 5 years ago

First, about #13, FYI, Pampy is not a real pattern matching(hereafter PM) library. It is atmost an implememtation of an interpreter that supports very limited and inextensible PM. Further, there has already been thousands of "plausive patterm matching" implemented for Python, and IMHO MacroPy could be a more correct one. Pampy made use of the information asymmetry to be attractive(even the 10-year-ago similar libraries didn't boast that way), and finally successfully become a cheater.

Your notations is a bit similar to Pampy, and a.match(case=action_under_case) seems to be so much evil, unmaintainable.

More, I suggest you to focus on ADTs instead of getting burdened from the mess of "pattern matching for PY". We can use method dispatch to achieve a part of PM's functionality, as you know we have typing.overload which works well.

Besides, there could be some feasible ways to achieve the true PM for Python without requiring any new syntax constructs:

Core idea: convert code objects(sorry you cannot get ASTs in runtime, except you choose the evil and unreliable inspect.getsource), to change semantics for specific structures; simultaneously add type checking support. E.g., you can use the notation like

with val_to_match:
    if Case(p1, p2, ...):
        ...
    if ....

to denote

case val_to_match of
    Case(p1, p2, ...) -> ...
     ...

HOW TO

You might also google CASE TREE or TREE PATTERN MATCHING for the canonical algorithms of PM.

thautwarm commented 5 years ago

Also, your mypy_plugin.py is cool!

jspahrsummers commented 5 years ago

Thanks for starting this discussion! I'm not very familiar with pampy, but I agree that it doesn't look very general or extensible (certainly not in the way we need).

I'm certainly open to other/additional syntaxes for pattern matching, as I'm not fully satisfied with a.match(case=action_under_case) either. However, I think this current API does have the nice property that one could it write by hand, for their own classes, without any magic like this library provides—which I presume is an important trait for a lot of Python developers. For a lot of the alternatives we may look at, they may be too arcane and turn people away from the library before really giving it a shot.

Could you describe the typing.overload approach you alluded to in more detail? How would it work with Cases which have the same types (e.g., Expression in the README)?

Your with syntax is certainly intriguing! I'd love to learn more about that as well. 😄Would you have any interest in submitting it as a pull request for discussion?

jspahrsummers commented 5 years ago

I should further note that if it's possible to outsource pattern matching to another library, which this one integrates with/exposes/depends upon, I'd be more than happy with that outcome. I'm just not well-versed in what's out there which might solve this problem more elegantly.

thautwarm commented 5 years ago

Could you describe the typing.overload approach you alluded to in more detail? How would it work with > Cases which have the same types (e.g., Expression in the README)?

As you can write your own mypy_plugin.py, it's not that necessary. If you do have insterest in this, you can recall these problems.

Your with syntax is certainly intriguing! I'd love to learn more about that as well. smileWould you have any interest in submitting it as a pull request for discussion?

I used to discuss some related stuffs with @laike9m recently, and IMO the library uncompyle and bytecode make a lot of sense in this domain. I just had a try today and FIGURED it out!

Check these files:

test.py

from pattern_matching import *

class MatchError(Exception):
    pass

class C:
    @classmethod
    def __match__(cls, x, i):
        if i is not 2:
            raise MatchError
        return x, x

@syntax_rule(PatternMatching().visit, debug=True)
def f(x):
    with match(x):
        if C(C(a, b), C(c, d)): print(a, b, c, d)

f(1)

generated codes:

def f(x):
    PM140513997965800.0 = x
    try:
        (PM140513997965800.1, PM140513997965800.2) = C.__match__(PM140513997965800.0, 2)
        (PM140513997965800.3, PM140513997965800.4) = C.__match__(PM140513997965800.1, 2)
        (PM140513997965800.5, PM140513997965800.6) = C.__match__(PM140513997965800.2, 2)
        a = PM140513997965800.3
        b = PM140513997965800.4
        c = PM140513997965800.5
        d = PM140513997965800.6
        print(a, b, c, d)
    except MatchError:
        raise MatchError

and corresponding stdout:

1 1 1 1

syntax_rule.py :

from uncompyle6 import deparse_code, PYTHON_VERSION
from io import StringIO
from inspect import Signature, _empty as empty
from types import FunctionType
from textwrap import indent
from time import time
import ast

try:
    from rbnf.py_tools.unparse import Unparser as print_ast
except ImportError:
    try:
        from astpretty import print_ast
    except ImportError:
        print_ast = print

try:
    from typing import *
except:
    pass

class Var:
    def __init__(self, name):
        self.name = name
    def __repr__(self):
        return self.name

def syntax_rule(transformer, debug=False):
    return lambda f: _syntax_rule(f, transformer, debug)

def _syntax_rule(f, transformer, debug):
    mangle = str(time()).replace(".", "_")

    assert isinstance(f, FunctionType)
    sio = StringIO()

    deparse_code(PYTHON_VERSION, f.__code__, out=sio)
    func_body_codestr = sio.getvalue()

    # `func_body_codestr` has no info of function head,
    # thus we should get the header manually.
    signature = Signature.from_callable(f)

    # for Python 3.6-, we should get the
    # correct order of function parameters.
    varnames = f.__code__.co_varnames
    params = sorted(signature.parameters.items(),
                    key=lambda pair: varnames.index(pair[0]))

    # Now, note that not all default value of a parameter
    # can be represented(via `repr`) directly. Say,
    #
    # ```
    # class S: pass
    # def f(a=S()):
    #   pass
    # ```
    #
    # in above codes you just cannot deparse the code object
    # into source code like
    #   `def f(a=<__main__.S object at 0x7f8c8c1692e8>): ...
    #
    # Also similar issues get raised up if any free variable here.
    #
    # As a result, please check my following codes for resolutions.

    freevars = {}
    for (name, param) in params:
        can_have_objects = ("default", "annotation")
        for obj_name in can_have_objects:
            obj = getattr(param, obj_name)
            if obj is not empty:
                # mangling here
                var_name = "_%s_%d" % (mangle, len(freevars))
                freevars[var_name] = obj
                setattr(param, "_" + obj_name, Var(var_name))

    for name, freevar in zip(f.__code__.co_freevars, f.__closure__ or ()):
        freevars[name] = freevar

    # the function header
    header = "def {name}{sig}:".format(name=f.__name__, sig=str(signature))
    func_def_codestr = header + "\n" + indent(func_body_codestr, prefix=" "*2)

    fn_ast = ast.parse(func_def_codestr).body[0]

    # perform your transformation on the function's AST.
    fn_ast = transformer(fn_ast)

    # debug
    if debug:
        ast.fix_missing_locations(fn_ast)
        print_ast(fn_ast)

    # Now we have all code piece for the function definition, but we
    # should handle the closures/default args.
    freevars = list(freevars.items())

    ast_for_all = ast.FunctionDef(
        # also mangling here
        name=".closure_func",
        args=ast.arguments(
            args=[ast.arg(arg=freevar_name, annotation=None) for (freevar_name, _) in freevars],
            vararg=None,
            kwonlyargs=[],
            kw_defaults=[],
            kwarg=None,
            defaults=[],
            ),
        body=[fn_ast, ast.Return(ast.Name(f.__name__, ctx=ast.Load()))],
        decorator_list=[],
        returns=None,
        )
    ast.fix_missing_locations(ast_for_all)

    code = compile(ast.Module([ast_for_all]), f.__code__.co_filename, "exec")

    exec(code, f.__globals__)
    closure_func = f.__globals__['.closure_func']
    del f.__globals__['.closure_func']
    return closure_func(*[var for (_, var) in freevars])

if __name__ == '__main__':

    @syntax_rule(lambda x: x)
    def f(x, y=1, **kw):
        return x + y

pattern_matching.py:

import ast

from syntax_rule import *

def compare(v1, op, v2):
    return ast.Compare(v1, [op], [v2])

def if_else(exp, br1, br2):
    return ast.If(exp, body=br1, orelse=br2)

def assign_name(name, val):
    return ast.Assign([ast.Name(name, ctx=ast.Store())], val)

def raise_not_match(_):
    """
    # TODO: more human-friendly error reporting
    """
    return ast.Raise(exc=ast.Name(id="MatchError", ctx=ast.Load()), cause=None)

class CaseCompilation(ast.NodeVisitor):
    """
    https://mail.python.org/pipermail/python-ideas/2015-April/032920.html

    with match(expr):
        if C(a, b): do_some1
        if _: do_some2
    =>
    .r0 = expr
    try:
        .r1 = C.__match__(.r0, 2)
        (.r2.a, .r3.b) = .r1
        a = .r2.a
        b = .r3.b
        do_some # with a and b
    except MatchError:
        try:
            r = .r0
        except:
            raise MatchError

            ...
    """
    def __init__(self, name_of_val_to_match, captures, block, pat: 'PatternMatching'):
        """
        :param captures: a dict maps mangling names to local names
        """
        self.name_of_val_to_match = name_of_val_to_match
        self.block = block # type: list
        self.pointer = None
        self.pat = pat
        self.captures = captures

    @property
    def val_to_match(self):
        return ast.Name(self.name_of_val_to_match, ctx=ast.Load())

    def visit_Num(self, v: ast.Num):
        self.visit_value(v.n)

    def visit_Str(self, v: ast.Str):
        self.visit_value(v.s)

    def visit_Name(self, v: ast.Name):
        self.captures[self.name_of_val_to_match] = v.id

    def visit_NameConstant(self, v: ast.NameConstant):
        self.visit_value(v.value)

    def visit_Constant(self, c: ast.Constant):
        self.visit_value(c.value)

    def visit_value(self, i):
        cond = compare(self.val_to_match, ast.NotEq(), ast.Constant(i))
        raise_ = raise_not_match(i)
        self.block.append(if_else(cond, [raise_], []))

    def visit_Call(self, call: ast.Call):
        """
        for constructors/recognizers
        """
        match = ast.Attribute(call.func, "__match__", ctx=ast.Load())
        matched = ast.Call(match, [self.val_to_match, ast.Constant(len(call.args))], keywords=[])
        ids = [self.pat.next_id for _ in call.args]
        lhs = ast.Tuple([ast.Name(id, ctx=ast.Store()) for id in ids], ctx=ast.Store())
        deconstruct = ast.Assign([lhs], matched, ctx=ast.Store())

        self.block.append(deconstruct)
        for id_, arg in zip(ids, call.args):
            CaseCompilation(id_, self.captures, self.block, self.pat).visit(arg)

class PatternMatching(ast.NodeTransformer):

    def __init__(self):
        def id_gen():
            i = 0
            while True:
                yield "PM%d.%d" % (id(self), i)
                i += 1
        self.local_id_generator = id_gen()

    @property
    def next_id(self):
        return next(self.local_id_generator)

    def visit_With(self, node: ast.With):
        # check if is the form:
        # ```
        # with case(_)
        # ```
        if not len(node.items):
            return self.generic_visit(node)

        item = node.items[0].context_expr
        if not isinstance(item, ast.Call):
            return self.generic_visit(node)

        fn = item.func
        if not isinstance(fn, ast.Name) or fn.id != "match":
            return self.generic_visit(node)

        # check if is `match(val)`
        assert not item.keywords and len(item.args) == 1
        # check if all stmts in the with block are in the form
        # `if <pattern>: stmts

        assert all(isinstance(stmt, ast.If) for stmt in node.body)

        val_to_match = item.args[0]
        name_of_val_to_match = self.next_id

        ifs = node.body # type: List[ast.If]
        def make_try_stmt(if_matched_br_, not_matched_br_):
            return ast.Try(
                    body=if_matched_br_,
                    handlers = [
                        ast.ExceptHandler(
                                type=ast.Name("MatchError", ctx=ast.Load()),
                                name=None,
                                body=not_matched_br_
                        ),
                    ],
                    orelse=[],
                    finalbody=[]
            )
        blocks = []

        for if_ in ifs:
            assert not if_.orelse # check if in the form of `if case: ...`
            captures = {}
            block = []
            case = if_.test
            case_compilation = CaseCompilation(name_of_val_to_match, captures, block, self)
            case_compilation.visit(case)
            for actual_name, local_bind_name in captures.items():
                block.append(assign_name(local_bind_name, ast.Name(actual_name, ctx=ast.Load())))
            block.extend(if_.body)
            blocks.append(block)
        blocks.reverse()

        # reduce
        last = [raise_not_match(None)]
        for each in blocks:
            last = [make_try_stmt(each, last)]

        return [assign_name(name_of_val_to_match, val_to_match), last[0]]

There also have been a long story about PM in Python-ideas, like the __match__ protocol: https://mail.python.org/pipermail/python-ideas/2015-April/032920.html .

And you should check these PM implementations though they are already very old: http://www.grantjenks.com/docs/patternmatching/#alternative-packages

thautwarm commented 5 years ago

I should further note that if it's possible to outsource pattern matching to another library, which this one integrates with/exposes/depends upon, I'd be more than happy with that outcome. I'm just not well-versed in what's out there which might solve this problem more elegantly.

Understood, but you might then have to consider how to support those protocols for the pattern matching libraries.

Could you please have a look at syntax_rule.py? I think you can use it to make a more elegant and efficient implementation of ADTs.

jspahrsummers commented 5 years ago

It's an impressive syntax transformation! My biggest concern is that additional magic like this would likely turn away developers who otherwise might consider using this library. I already worry about this being the case, so further movements in this direction would likely make it a non-starter for all but the most hardcore FP-inspired programmers.

Would it be possible to build the above example into a library which sits on top, and which we could reference in the README?

thautwarm commented 5 years ago

Yes, I understand your concerns. Actually I've made it to https://github.com/thautwarm/moshmosh

However it's only a propotype that needs further iterations. If you can call people to help with this PM idea(like making a new library refined from mine), we might have a battle-tested implememtation soon.

jspahrsummers commented 5 years ago

Awesome! I'll close this out for now, then. When you think it's ready to be used, I can link to it from adt's README as a nicer solution for pattern-matching. 😄

thautwarm commented 5 years ago

Thanks! But might not be that sooner.