openml / openml-python

Python module to interface with OpenML
https://openml.github.io/openml-python/main/
Other
276 stars 142 forks source link

Extensions: `run_model_on_task(model, ...)` should accept which extension to use #1312

Open eddiebergman opened 6 months ago

eddiebergman commented 6 months ago

Description

When running run_model_on_task(model, ...), it uses registered extensions to decide which one to use. If we are to migrate to having extensions as their own packages, it might be cleaner to just have users pass it in explicitly (after all, they did download it). One possible reasons is two Extensions that work with the same model class. We could still use the register feature, however these global list things can be brittle when it comes to multiprocessing and other shenanigans that aren't beyond the simplest use case.

Another more implicit option to have class "register" is to not even register, and just inherit from it, such that Extension.__subclasses__ can be used. In the odd case a subclass does not want to be registered, they could advertise a registered: ClassVar[bool] = False which is by default false.

class Extension:
    registered: ClassVar[bool] = True

    @classmethod
    def registered_extensions(cls) -> list[Extension]:
        return [subcls for subcls in cls.__subclasses__ if subcls.registered]

# Accessible
class ExtensionA(Extension):
    pass

# Is intended not to be an extension used but rather inhereited from
class ExtensionOtherBase(Extension):
    registered: ClassVar = False

# Accessible
class ExtensionB(ExtensionOtherBase):
    pass
PGijsbers commented 6 months ago

Are ExtensionA and ExtensionOtherBase supposed to inherit from Extension in the example?

Either way, I am not sure yet what I would personally prefer. And a lighter connection through Protocols might also be an option.

eddiebergman commented 6 months ago

Whoops they are, will update the example!

Could do protocols technically if there's no logic to be done in the Extension base class