metagraph-dev / metagraph

Multi-target API for graph analytics with Dask
https://metagraph.readthedocs.io/en/latest/
Apache License 2.0
26 stars 7 forks source link

[DISCUSSION] Relationship between typeclasses and data objects / wrappers and function signatures #4

Closed seibert closed 4 years ago

seibert commented 4 years ago

Here's a go at describing how to unify the ideas in both #2 and #3. My assumptions are that we need to specify:

There are still a bunch of aesthetic/functional choices here, so here's my proposed choices with some explanation:

Worked example

Let's use Graph and WeightedGraph to see how this works.

Abstract types

class GraphType(AbstractType):
    '''A graph is a collection of nodes and edges that connect nodes.'''
    pass  # nothing more to specify

class WeightedGraphType(GraphType):
    '''A graph that specifies a numeric weight value for each edge'''
    # a weighted graph can be converted to a graph, but a graph 
    # cannot be converted to a weighted graph
    pass

Note that abstract types are basically just a class and a docstring, with inheritance showing how things might be related to each other.

Wrapper classes

An instance of networkx.DiGraph meets our requirement for Graph but not WeightedGraph. For a weighted graph, we will need to define an extra wrapper class to carry the attribute name of the weight.

class NetworkXWeightedGraph:
    def __init__(self, graph, weight_label):
        self.graph = graph
        self.weight_label = weight_label
        assert isinstance(graph, nx.DiGraph)
        assert (
                weight_label in graph.nodes(data=True)[0]
        ), f"Graph is missing specified weight label: {weight_label}"

Concrete types

Now we need some types to describe the NetworkX instances above. Let's assume our base ConcreteType looks like this:

class ConcreteType:
    abstract = None  # must override this
    allowed_props = frozendict()  # default is no props
    target = 'cpu'  # key may be used in future to guide dispatch 

    def __init__(self, abstract, **props):
        self.abstract = abstract
        for key in props:
            if key not in allowed_props:
                raise KeyError(f'{key} not allowed property of {self.__class__}')
            # maybe type check?
        self.props = dict(props)  # copying to be paranoid

    def is_satisfied_by(self, other_type):
        # check if other_type is at least as specific as this one
        if isinstance(other_type, self.__class__):
            for k in self.props:
                if self.props[k] != other_type.props[k]:
                    return False
        return True

    def __eq__(self, other_type):
        return isinstance(other_type, self.__class__) and \
            self.props == other.props

    def isinstance(self, obj):
        # Return True if obj is an object described by this Concrete Type
        raise NotImplementedError()

    def get_props(self, obj):
        # Return a dict of properties that this object satisfies
        raise NotImplementedError()

The type for the NetworkX graph is:

class NetworkXGraphType(ConcreteType):
    abstract = GraphType
    value_class = nx.DiGraph
    allowed_props = frozendict({
        # placeholders for now
        'foo': bool,
        'bar': int,
    })

    def isinstance(self, obj):
        # default implementation means this doesn't have to exist unless you need to programmatically identify values of this type

And the weighed graph is:

class NetworkXGraphType(ConcreteType):
    abstract = WeightedGraphType
    value_class = NetworkXWeightedGraph
    allowed_props = frozendict({
        # placeholders for now
        'baz': str,
    })

Translator

A translator is a function that takes a value of one concrete type and maps it to a value of another concrete type (optionally with the desired type properties asserted). A translator might look like this:

@metagraph.translator
def nx_to_cugraph(src: NetworkXGraphType, **props) -> CuGraphType:
    # insert implementation here

For simplicity of dispatch, a translator must be able to handle all properties of both the source and destination concrete type. The decorator is used to add any additional methods or attributes to the functions that the system will find useful. Note that the decorator does not record this function in any global registry (see below).

Note that if a concrete type has properties, it is necessary to define a "self-translator", which is used translate the value into one with the required properties:

@metagraph.translator
def nx_to_nx(src: NetworkXGraphType, **props) -> NetworkXGraphType:
    # insert implementation here

The @metagraph.translator decorator turns the function into a callable object with additional properties:

Abstract Algorithm

Abstract Algorithms are just Python functions without implementations that have a type signature that includes Abstract Types. For example, the Louvain community detection might look like this:

from typing import List

@metagraph.abstract_algorithm(name='community.louvain')
def louvain(graph: GraphType) -> List[GraphType]):
    '''Return the louvain subgraphs'''
    pass

As with the translators, the decorator is used to add useful methods and attributes to the function, as we will see below.

Concrete Algorithm

Concrete algorithms look like the abstract algorithm, but use concrete types:

@metagraph.concrete_algorithm('community.louvian')
def nx_louvain(graph: NetworkXGraphType) -> List[NetworkXGraphType]:
    # insert implementation here

Note that this decorator does not record the nx_louvain method in a registry hidden inside of the abstract louvain algorithm. Instead it converts the function into a callable class with attributes like:

If we want to define a concrete algorithm that only accepts values with a particular property (allowed properties are enumerated in the concrete type), we can do that this way:

@metagraph.concrete_algorithm('community.louvain')
def special_nx_louvain(graph: NetworkGraphXType(foo=True, bar=4)) -> List[NetworkXGraph(foo=True)]:
    # insert implementation here

This requires the input graph to have both the property of foo=True and bar=4, and asserts that the return value has property foo=True, but nothing else.

Registration

For both testing purposes, as well as creation of special contexts, we will want to encapsulate the state associated with the registry of types, translators and algorithms. We call this state a Resolver, and it is responsible for:

There will be an implicit, global Resolver created by metagraph when imported that is populated by all of the plugins in the environment. Empty resolvers can also be created and populated manually.

The Resolver class will have methods like this:

class Resolver:
    def register(
        self,
        *,
        abstract_types: Optional[List[AbstractType]] = None,
        concrete_types: Optional[List[ConcreteType]] = None,
        translators: Optional[List[Translator]] = None,
        abstract_algorithms: Optional[List[AbstractAlgorithm]] = None,
        concrete_algorithms: Optional[List[ConcreteAlgorithm]] = None,
    ):
        pass

    def load_plugins(self):
        '''Populate registries with plugins from environment'''
        pass

    def typeof(self, obj):
        '''Returns fully specified concrete type of obj'''
        pass

    def convert_to(self, src_obj, dst_type, **props):
        '''Converts src_obj to instance of dst_type with given properties'''
        pass

    def match_algo(self, abstract_algo, arg_types, kwarg_types):
        '''Returns concrete algorithm that matches the given abstract
        algorithm and args/kwargs'''
        pass

As a convenience, the resolver can also dynamically generate the algorithm namespace below it. Ex:

res = Resolver()
res.load_plugins()

# dispatch and call immediately
mygroups = res.algo.community.louvain(mygraph)

# pick the concrete algo and return it
louvain_func = res.algo.community.louvain.match(mygraph)
eriknw commented 4 years ago

Quick note: dispatching based on literal could be useful if this literal affects the output types. For example (which is kinda ugly, but it gets the point across),

@metagraph.concrete_algorithm('community.louvian')
def nx_louvain_iterations(graph: NetworkXGraphType, output: Literal('list')) -> List[NetworkXGraphType]:
    ...

@metagraph.concrete_algorithm('community.louvian')
def nx_louvain(graph: NetworkXGraphType, output: Literal('final')) -> NetworkXGraphType:
    ...
seibert commented 4 years ago

to capture discussion from meeting: if we need to get fancy with computed return type, we can make the return type have callable which returns a type given the argument signature

eriknw commented 4 years ago

I'll just leave this example here...

class AbstractType:
    pass

class ConcreteType:
    def __init_subclass__(cls, *, abstract=None):
        if abstract is None:
            raise TypeError("missing abstract.  Here's how you do it...")
        elif not isinstance(abstract, type) or not issubclass(abstract, AbstractType):
            raise TypeError('blah blah blah')
        cls.abstract = abstract

class MyAbstractType(AbstractType):
    pass

class MyConcreteType(ConcreteType, abstract=MyAbstractType):
    pass

class OopsIForgotAbstract(ConcreteType):  # raises
    pass
seibert commented 4 years ago

__init_subclass__ is also new to me, but I'm glad there is a less wonky way to validate subclasses than some metaclass magic

seibert commented 4 years ago

Closing this issue as we've addressed these topics in #5.