Discussion: `BenchmarkSpecification` modularity

cwognum commented 1 year ago

xref: https://github.com/datamol-io/polaris/pull/3#discussion_r1226841168

It appears likely that for any benchmark we will come up with, we require a specific class, e.g. MoleculeScoringSingleTaskBenchmark or PhenomicScoringMultiTaskBenchmark. Most users won't have to worry about this part of the API, as it is abstracted away through the load_benchmark() interface. For the maintainers, however, it is useful to specify the different assumptions (e.g. on the split, the metrics, the predictions, etc.) that are associated with a benchmark.

Since I expect this to be a rather influential design choice, I wanted to create this issue for a preliminary discussion on how to implement a system that accounts for such differences in a scalable manner. I had two thoughts to start the discussion:

Mixin: We implement unit functionality in separate, modular classes. These can then be flexibly combined through inheritance. E.g.:

class ScoringMixin:
    def evaluate():
        ...

class SingleTaskMixin:
    def split():
        ...

# A new benchmark can now leverage these building blocks!
class SingleTaskScoringBenchmark(SingleTaskMixin, ScoringMixin): 
    ...

Observer (Callbacks): An issue I see with (1) is that several mixin classes might implement complementary (or competing) unit functionality. For example, in the above code snippet both the ScoringMixin and SingleTaskMixin might implement the evaluate() function. This could be solved by having a a more granular set of units that a mixin class can implement, but at that point I think it might make more sense to switch to a callback system instead. E.g.:

class Benchmark:
    def evaluate():
        for callback in self.callbacks:
            callback.pre_evaluate_hook()
        ...

class ScoringCallback(Callback):
    def pre_evaluate_hook():
        ...

class SingleTaskCallback:
    def pre_evaluate_hook():
        ...

# A new benchmark now specifies the relevant callbacks
class SingleTaskScoringBenchmark(Benchmark): 
    def __init__():
        super().__init__(callbacks=[SingleTaskCallback(), ScoringCallback()])

cwognum commented 1 year ago

ping @jstlaurent @hadim

hadim commented 1 year ago

Thanks, Cas!

My gut feeling would be to experiment the mixin pattern instead of callback/observer or another similar pattern called composition as well. It's been a long time since I haven't used those design patterns myself, so besides "gut feeling" I don't have a ton of arguments to be honest xD

Maybe one of the reason is that I don't really see how the callback pattern will address the issues you've mentioned. For the mixin pattern to work, it's important that every mixin do something that is well scoped without overlapping on other mixins (this applies for the logic but also for the state). This is the whole challenge here, but I feel like the "classes" and "logic" we are manipulating could be well suited for it.

Perhaps we can take inspiration from other Python projects or external resources. Ultimately, the best is to move forward with one (mixin would be my suggestion here) and see how it goes!

PL old ticket about someone proposing to move away from mixins for the composition pattern: https://github.com/Lightning-AI/lightning/issues/1432
HF seems to also use mixins in multiple libraries: https://huggingface.co/docs/huggingface_hub/package_reference/mixins and https://github.com/huggingface/transformers/blob/v4.30.0/src/transformers/modeling_utils.py#L1027
PL again: actually after a quick search I don't see that many mixin matches in the PL code: https://github.com/search?q=repo%3ALightning-AI%2Flightning+mixin&type=code&p=1. Maybe they finally moved away from it? I could not find a ticket related to that.

Finally, even if my suggestion would be to explore the mixin pattern, I am 100% ok if you prefer to explore something else like callback/observer.

cwognum commented 1 year ago

I will not have time to work on this today, so some extra thoughts to ponder on over the weekend:

Ultimately, I believe the choice between Callback and Mixin comes down to how you specify the "units" (i.e. the functionality that each component implements).

If there is one or more components implementing the same unit, Callbacks make more sense to me personally. By specifying the callbacks as a list, you can specify the priority of the different components as well.
If each unit is clearly decoupled, than Mixin makes more sense.

Logically the latter makes more sense as a modular system, but since I struggle to think of what the units might look like I'm leaning towards the former. I'm ultimately asking myself what types of benchmarks we might want to add in the future and which components we would thus have to create.

Then finally a word on the Composite design pattern. Here the biggest differentiator seems that there is an explicit hierarchy in which to execute the different components and compose their results. Since it is assumed that there is a single interface (e.g. execute()) that all components implement, this makes less sense to me.

I agree that we should just make an educated guess. We can discuss this briefly during next Monday's meeting and pick a direction. I'm personally leaning towards callbacks still, but I can be easily convinced with some further discussion on how to define the units.

hadim commented 1 year ago

So far, I can think of at:

Single task
Multi task
Should we separate regression/classification?
In classification: binary or multiclasses
inputs:
- small molecules (2d/3d)
- images (actually might be only relevant for dataset and not benchmarking, tbd further...)
- protein (1d/3d)
- pair of modalities: for relationships assignment tasks
- nothing (for de novo design)
special types:
- turn a scoring small mols task into an activity cliff task

For now, we focus on small molecule scoring (and all the possibilities associated to it) + generative design.

I have a meeting with Oren next week to discuss what would be a good benchmark to include in Polaris for Seldon. I'll keep you in touch as soon as I have more details. That will be a good use case to stress test the API and above assumptions.

cwognum commented 1 year ago

I didn't realize that with Mixin, you can have multiple definitions for the same method. I'll need to read a bit more about how Python determines in which order to execute things, but that probably makes Mixin the better option after all.

class Base(object):
    def dispatch(self):
        print("Base!")

class FooMixin(object):
    def dispatch(self):
        print("Foo!")
        return super(FooMixin, self).dispatch()

class BarMixin(object):
    def dispatch(self):
        print("Bar!")
        return super(BarMixin, self).dispatch()

class FooBar(FooMixin, BarMixin, Base):
    pass

FooBar().dispatch()
"""
> Foo!
> Bar! 
> Base!
"""

jstlaurent commented 1 year ago

'll need to read a bit more about how Python determines in which order to execute things

The order is left-to-right of the parent classes declaration in the child class.

jstlaurent commented 1 year ago

More generally, I'd prefer the mixin approach to start with. The observer pattern is powerful and flexible, but it's also more complex and makes it more difficult to determine the behavior of the assembled class. I would wait until our needs require that additional complexity before adding it.

I think it's ok if some mixins are incompatible with each other. I'd suggest defining the base API in a Benchmark class that would serve as the base for all child compositions. Something like:

class Benchmark:
    def evaluate(self, *args, **kwargs):
        pass

    def split(self, *args, **kwargs):
        pass

    def do_something_else(self, *args, **kwargs):
        pass

    def run(self, *args, **kwargs):
        """
        Entrypoint method
        """
        self.split()
        self.do_something_else()
        # etc.
        self.evaluate()

Each mixin can implement the portion of that API that makes sense, and invoke super() to chain execution.

class ScoringMixin:
    def evaluate(self):
        super(ScoringMixin, self).evaluate()

class SingleTaskMixin:
    def split():
        super(SingleTaskMixin, self).split()

We need to make sure that all composed benchmark classes use Benchmark as their base (last class in the parent classes definition), so that the super call chain terminates:

# A new benchmark can now leverage these building blocks!
class SingleTaskScoringBenchmark(SingleTaskMixin, ScoringMixin, Benchmark): 
    pass

And with that, you got something that is pretty flexible, but also understandable and fairly easy to unit test.

cwognum commented 1 year ago

I defined a base BenchmarkSpecification class which has the get_train_test_split() endpoint. Any object instantiated from this class will have the split attribute which specifies how to do the split. Without making assumptions on where a mixin object gets mixed into, how does a SplitterMixin access these attributes? The following would work:

class Benchmark:

    def get_train_test_split(self):
        self.execute_split(self.split)

    def execute_split(self, indices, ...):
        ...

class SplitMixin:
    def execute_split(self, indices, ...):
        ...

Is there a better way?

If not, would you make the execute_split() method private / protected (i.e. _execute_split())?

jstlaurent commented 1 year ago

I defined a base BenchmarkSpecification class which has the get_train_test_split() endpoint. Any object instantiated from this class will have the split attribute which specifies how to do the split. Without making assumptions on where a mixin object gets mixed into, how does a SplitterMixin access these attributes? The following would work:
class Benchmark:

    def get_train_test_split(self):
        self.execute_split(self.split)

    def execute_split(self, indices, ...):
        ...

class SplitMixin:
    def execute_split(self, indices, ...):
        ...
Is there a better way?

If not, would you make the execute_split() method private / protected (i.e. _execute_split())?

I'm not quite sure what you're asking @cwognum. Is your question "How do mixins access attributes from the class they are mixed-in with?" If so, this works:

class Benchmark:

    splits: list[SplitType] = []

class SplitMixin:
    """
    Intended to be mixed-in with something that has a 'split' attribute
    """
    def method_using_attribute():
        for split in self.split:
            ...

class SplittingBenchmark(SplitMixin, Benchmark):
    """
    Can call 'method_using_attribute' from SplitMixin and it all works
    """
    def __init__(self, splits: list[SplitType]):
        self.splits = splits

Or are you asking about something else?

cwognum commented 1 year ago

Thank you! I assumed that wouldn't work, but looking at it again I'm not sure why. What is throwing me off is that the IDE gives a unresolved attribute reference for the SplitMixin class in your example. Is there a way to specify which attributes we expect for the mixin class?

jstlaurent commented 1 year ago

Thank you! I assumed that wouldn't work, but looking at it again I'm not sure why. What is throwing me off is that the IDE gives a unresolved attribute reference for the SplitMixin class in your example. Is there a way to specify which attributes we expect for the mixin class?

One way to do it in a mixin is to define them there as well:

class SplitMixin:
    """
    Intended to be mixed-in with something that has a 'split' attribute
    """
    splits: list[SplitType] = []

    def method_using_attribute():
        for split in self.split:
            ...

It does get a little cumbersome with a lot of mixins. You can also have the mixin class inherit from the base class:

class Benchmark:

    splits: list[SplitType] = []

class SplitMixin(Benchmark):
    """
    Intended to be mixed-in with something that has a 'split' attribute
    """
    def method_using_attribute():
        for split in self.split:
            ...

It's a little clunky because what we'd rally want to have is Traits, but Python doesn't support that construct.

Or you can do what most people do and add a no-qa comment on the line.

cwognum commented 1 year ago

xref #6

polaris-hub / polaris

Discussion: `BenchmarkSpecification` modularity #5