mrocklin / multipledispatch

Multiple dispatch
https://multiple-dispatch.readthedocs.io/en/latest/
Other
808 stars 71 forks source link

Adding predicative dispatch and annotations #64

Open DavidMertz opened 7 years ago

DavidMertz commented 7 years ago

This is more an informational note than a real issue, most likely. I have forked the project to https://github.com/ContinuumIO/multipledispatch. I might also rename the fork, but I'm not sure what name I would use yet.

There are two major features I'd like to add, but these quite likely are not things you'd want in this project itself. I've forked to the ContinuumIO namespace because if I find time for these things. The first change is I'd like to (optionally) support annotations. For example, whereas current spelling is:

@dispatch(int, int)
def ceil_add(i, j):
    return i+j

@dispatch(float, float)
def ceil_add(x, y):
    from math import ceil
    return int(ceil(x+y))

I'd like to be able to spell this in a manner more-or-less consistent with PEP 484

@multimethod
def ceil_add(i: int, j: int) -> int:
    return i+j

@multimethod
def ceil_add(x: float, y: float) -> int:
    return int(ceil(x+y))

multimethod may or may not be just another name for dispatch, but it would certainly share most of the same machinery.


The second thing was inspired by a group discussion involving Peter Wang. He spoke affectionately of Picture clauses in COBOL (or at least the concept: https://en.wikipedia.org/wiki/COBOL#PICTURE_clause). That is, Peter likes the idea of the function following the "shape" of the data, not only its type. A million years ago, Phillip J Eby's PEAK did something like this, i.e. "predicate dispatch" and I've been wanting to implement this in a more modern context for a long time (See http://gnosis.cx/publish/programming/charming_python_b22.html for a discussion). I had an even more ancient multimethods package that I once wrote (http://gnosis.cx/download/gnosis/magic/multimethods.py), but starting with this project feels like it makes more sense.

So in particular, I'd like to be able to write something like this:

from operator import mul
from functools import reduce
from math import sqrt, e, pi as π

@multimethod
def approx_factorial(n: Predicate["int < 100_000"]) -> int:
    return reduce(mul, range(1,n+1), 1)

@multimethod
def approx_factorial(n: int) -> int:
    return int(sqrt(2*π*n)) * int(n/e)**n

Note: This is a terrible approximation because casting int(n/e) introduces large errors. However, if we do not do it, floats quickly overflow in the **n. What we actually need to do is to take n/e to the largest power we can without float overflow, then multiply those partial exponents together as ints until we've got n worth of them. But the one-liner is meant to illustrate the interface not the function implementation.

For data in something like more like a picture clause (e.g. in processing a CSV file), we might have something like:

@multimethod
def process_date(dt: DateTime["yyyy-mm-dd"]) -> float:
    # ... do something with some sort of datetime object
    return the_answer

@multimethod
def process_date(dt: DateTime["mm/dd/yy"]) -> float:
    # ... do something with some sort of datetime object
    return the_answer
mrocklin commented 7 years ago

You may want to review https://github.com/mrocklin/multipledispatch/pull/4 for python 3 annotations.

If you want to do more complex predicates then I encourage you to pay attention to performance. The current implementation adds only a few microseconds of overhead. I suspect it will be challenging to have more complex constraints, maintain consistency and hierarchy, and maintain performance.

Enjoy the fork!

DavidMertz commented 7 years ago

I'm not clear whether #4 is merged in. If so, that is a lot of what I want in the extra functionality. I'll look at the code and figure it out.

Predicative dispatching absolutely kills performance. There's no way around that since it's just basically an alternate way to spell a conditional check inside a function body. If I manage to do this, the documentation would need to be clear on this. But if we have it, of course we could write:

@multimethod
def some_func(i: Predicate["np_hard_check(int)"]):
    # ... do stuff ...

But that's no worse than:

def plain_func(i):
    if np_hard_check(i):
        # ... do stuff ...
    else:
        # ... do other stuff ...

I'm not really sure about the spelling of using the type as the variable "name" inside the expression. In concept, I'd like to rule out checking the predicate when the type doesn't match to start with. Perhaps it would be more explicit to use:

@multimethod
def some_func(i: Predicate[int, "np_hard_check(_)"]):
    # ... do stuff ...
llllllllll commented 6 years ago

I think that given a small DSL of expressions, you could implement this in a reasonable way. Imagine a DSL which supported inequalities and type checks; for example:

import multipledispatch as md

@md.dispatch(md.Int() < 10)
def f(x):
    print('x is less than 10')

@md.dispatch(md.Int() < 100)
print('x is less than 100')

@md.dispatch(int)
def f(x):
    return f(md.PredicateBox(x))  # idk what the name of this is

This allows us to maintain a partial ordering of functions based on the input arguments. To support this, you need to lift the expression into the type and probably box the type of the object being dispatched so that issubclass will work as expected. Supporting just inequalities also solves a real use case: delegate to different functions based on the size of the dataset. For example, the C stdlib function qstort uses three different sorting algorithms and dispatches based on the length of the input.

This is not unrelated to how I implemented VarArgs[...] for dispatching based on the members of sequences which is still fast enough to work for a real project. Basically, given a restricted set of operations we can still keep this efficient, the problem only arises when you want arbitrary predicates. I think that it is fine to intentionally limit the scope of the predicates because the point of multipledispatch is to create a declarative interface for a very common pattern. If you started adding a lot of custom predicates, your code is already not following the common pattern multipledispatch targets and the result will likely be no more readable than an explicit if clause.

I am not sure that I would want to make this part of multipledispatch itself, but it should be easy enough to make a standalone library that interfaces naturally with multipledispatch like VarArgs does.

pertsevds commented 2 years ago

Has anyone seen https://github.com/septumca/pypd ?