inpsyde / modularity

A PSR-11 implementation for WordPress Plugins, Themes or Libraries.
https://inpsyde.github.io/modularity/
GNU General Public License v2.0
42 stars 4 forks source link

Support "deferred" modules #12

Open gmazzap opened 3 years ago

gmazzap commented 3 years ago

This issue describes an additional funcitonality that I think would be nice to have, based on the experience of using WP App Container for plugins and from discussion with colleagues trying to implement modularity in their plugins.

The scenario

Let's imagine we have a simple plugin that needs to perform a couple of operations.

The first operation must be done pretty early, let's say on "init".

The second operation can't happen that early, because it depends on a condition that is only possible to check at a later hook.

The plugin could use two modules, one per operation.

Let's ignore for now the first module, and let's imagine how the second module could look like.

class MyTemplateRedirectModule implements ServiceModule, ExecutableModule
{
    public function services(): array
    {
       // register here a few services
    }

    public function run(ContainerInterface $c): bool
    {
       return add_action(
           'template_redirect',
           function () use ($c) {
               if ( ! is_post_type_archive(MyCpt::NAME) ) {
                   return;
               }

               // do things here using services from container
           }
       );
    }
}

This a quite common situation. We only need to do "things" on specific type of frontoffice request, but the first hook to safely check for conditional tags is template_redirect.

The issue

Now, if in our website we have a request that satisfies our condition once every 1000 requests, it means that 99.9% of the times the module registers services in the container for no reason.

When the number of services registered is not trivial, register services might be resource consuming, because pollutes memory and trigger autoload (which might be slow file-based operation).

When template_redirect runs, it is too late to add things in the container, because the plugin boot is executed earlier (because there's a plugin functionality that needs to run earlier).

That's why we check the condition inside run(), but the container instance passed there is read-only, so we need to add services earlier.

In short, the problem is that when we have a module that should register things conditionally, and the condition is not available very soon (quite common in WordPress), we are forced to register things unconditionally and then check the condition later.

The possible solution

I think that a possible solution could be to introduce a DeferredModule interface, similar to this:

interface DeferredModule extends Module
{
    public function deferredTo(): string;

    public function factoryModule(Package $package): ?Module;
}

and a Package::addDeferredModule() method, with a signature like this:

public function addDeferredModule(DeferredModule $module): Package;

Internally, the method would add an action to the hook returned by DeferredModule::deferredTo() and inside that hook, it would call DeferredModule::factoryModule() and the returned module, if any, would be added in the same way it is hadded when calling Package::addModule().

The code could look like this (it's an idea, completely untested):

public function addDeferredModule(DeferredModule $module): Package
{
        add_action(
            $module->deferredTo(),
            function (): void {
                $deferredModule = $module->factoryModule($this);
                if (!$deferredModule || $this->statusIs(self::STATUS_FAILED)) {
                    return;
                }

                // services/factories/extensions are added here...

                if (($deferredModule instanceof ExecutableModule) && $this->statusIs(self::STATUS_BOOTED)) {
                    // deferred + executable module is executed right-away, when package was already booted
                    $module->run($this->container());
                }
            }
        );
    }

With the above code in place, in the scenario I described a the beginning, the MyTemplateRedirectModule could be rewritten like this:

class MyTemplateRedirectModule implements ServiceModule, ExecutableModule, DeferredModule
{
    public function deferredTo(): string
    {
        return  'template_redirect';
    }

    public function factoryModule(Package $package): ?Module
    {
       return is_post_type_archive(MyCpt::NAME) ? $this : null;
    }

    public function services(): array
    {
       // register here a few services
    }

    public function run(ContainerInterface $c): bool
    {
        // do things here using services from container
    }
}

In this case the deferred module return itself from factoryModule(), but it could return another module.

Anyway, having the proposed code in place, by calling the following line the issue is solved in a pretty simple way.

$package->addDeferredModule(new MyTemplateRedirectModule());

For requests that are not archive of a specific post type, not only the module does not execute anything, it also does not register anything.

Additional notes

Considerations on addDeferredModule()

The code proposed above for addDeferredModule() is simplified. There are other things to consider, e.g.:

Deferred modules for inter-module dependency

Deferred modules could be used for another use case, thanks to the fact that the DeferredModule::factoryModule() receives an instance of the package.

Imagine a module that should be added only if another module is present.

Package class has the moduleIs() method, but that can't be used on Package::ACTION_INIT to determine if a given package was added or not, because maybe the target package will be added later. And in any case, can't be used to check if a module was executed or not, because the execution happens later.

With the addition of the proposed functionalities, it would be possible to do things like:

class SomeModuleDependantModule implements ServiceModule, ExecutableModule, DeferredModule
{
    public function __construct(string $hookName)
    {
        $this->hookName = $hookName;
    }

    public function deferredTo(): string
    {
        return  $this->hookName;
    }

    public function factoryModule(Package $package): ?Module
    {
        return $package->moduleIs(SomeModule::class, Package::MODULE_EXECUTED)
            ? $this
            : null;
    }

    public function services(): array
    {
       // register here a few services
    }

    public function run(ContainerInterface $c): bool
    {
        // do things here using services from container
    }
}

$plugin = plugin();
$module = new SomeModuleDependantModule($plugin->hookName(Package::ACTION_READY));
$plugin->addDeferredModule($module);

In the snippet above, I added a deferred module that when the package is ready (ACTION_READY action is fired) checks if the SomeModule module was executed successfully, and only if so will add itself as an additional module.

A similar workflow can be currently implemented. In fact, currently it is possible to do:

add_action(
    $package->hookName(Package::ACTION_READY)
    function (Package $package) {
        if ($package->moduleIs(SomeModule::class, Package::MODULE_EXECUTED)) {
            (new SomeModuleDependantModule())->run()
        }
    }
);

So we can currently execute SomeModuleDependantModule only if SomeModule was executed successfully, but we can't currently add services at that point, because when ACTION_READY hook is fired, the container is read-only, so any service required by SomeModuleDependantModule would need to be registered unconditionally.

overclokk commented 11 months ago

Why not adding a Proxy to handle that?

With a Proxy any service will not be instantiated untill some method will be executed so they are deferred by default, good for WordPress Events.

I implemented such system here https://github.com/ItalyStrap/empress/blob/master/src/ProxyFactory.php and I pass this callable to a modified version of Auryn\Injector I made https://github.com/ItalyStrap/empress/blob/149a7b25a83fb97564eff0b58b3997ae9e3b3f9b/src/AurynConfig.php#L112 and here https://github.com/ItalyStrap/empress/blob/149a7b25a83fb97564eff0b58b3997ae9e3b3f9b/bridge/Injector.php#L342

Now to declare a service as a proxy I only need to add the service::class name to Injector::proxy.

For doing so I use a library made by Ocramius https://github.com/Ocramius/ProxyManager this library is very strict about PHP version but there is also a fork with LTS support https://github.com/FriendsOfPHP/proxy-manager-lts

And if you are concern about performance you can tuning it as well https://github.com/Ocramius/ProxyManager/blob/2.15.x/docs/tuning-for-production.md

gmazzap commented 11 months ago

@overclokk A proxy can not fully help here. The problem is that the services right now must be added to the ContainerConfigurator before "init". After that hook is called, the container configurator creates the container which is read-only. At that point no services can be added anymore.

When I say "the services must be added" I'm not referring to service instances, but to services factory callbacks. That is, callbacks that when executed will instantiate the service.

To reinforce: all services are already "lazy". When you register a service, it is not instantiated at all. It is only instantiated when/if another service requires it. It is already "kind-of proxied". This issue aims at reducing the memory footprint because even if lazy, the factory callback must be stored in memory, just like the proxy would need to be created and stored in memory, so there's no way we can reduce memory footprint via proxy.

The key point here is that the read-only container we use is very strict, and it forces us to register everything upfront.

Registering a factory callback or registering a proxy is not much different conceptually. If the read-only container would allow us to register factory callbacks after "init", yes, we could achieve this via a proxy.

That would have the benefits that the proxied service could be passed to other services before it is ready.

But it would hve the disadvantage that it would be more memory-heavy, besides adding a 3rd party dependency to a package that is used pretty much in every other package in the company, and that does not sound like a great idea to me.

So IMO the thing to solve here is make the read-only container nor read-only anymore, but maybe append-only, and possibly locked from the external. That is a pre-requisite to make any use of a proxied service.

But, to be honest, if I realize I need the service to be injected to other service before a certain hook, I can just register the service upfront as we do now.

The only case in which a Proxy would at that point be useful, is that a service instantiation is heavy, but that is something we should solve by having constructors that are mostly noop, not encoraging proxies, especially when that means to introduce dependency to very opinionated 3rd-party dependencies.

Biont commented 11 months ago

I get the spirit of this idea and it's actually something I've been discussing a few times in the recent past. My TLDR is:

I would like to collect and see data that supports the notion that there is a problem with "over-registering" services. Supporting deferred modules presents a quite substantial shift away from a "rigid monolith" where everything is known up-front and I would like to see if we actually have a problem - and if it also exists in production environments


Long answer: I actually see the "rigid monolith" as an aid in the sometimes confusing multi-context environment that is WordPress. While you could technically already have some dynamic components by conditionally adding/withholding modules and connecting Packages, one core aspect remains once Package::boot ran:

Everything that is declared is also available.

This makes it very straightforward to conceptualize your "large library of services". If you get a ContainerException, you know there's a fundamental problem with your service declarations and there's only a handful places to look for a solution. Deferring modules in a way that also places the maintenance burden on the module developers adds a large amount of complexity to that somewhat safe base assumption. A modular service container isn't exactly the first thing you crayon into Notepad++ during your first PHP tutorials, so I have some apprehension to things that add more noise. Now you get a ContainerException, but it could be for an almost infinite amount of reasons and you're back in the hellscape of hook priorities and is_admin || !is_ajax() checks

I realize I'm arguing very subjectely and gut-feely here. This is because I quite fully agree with the reasoning behind the feature proposition on a technical level, but I have concerns that it is a "solution looking for a problem". So do we have data that suggests we need to invest in this? And are the potential benefits substantial enough to introduce such a paradigm shift in the way we look at the list of declared services?

If we have that data:

Maybe there is a way to do this transparently? Imagine we'd have a build/caching step for production environments. It could essentially array_keys($module->services())and transform the result into a map of MyCpt::NAME => [ ModuleA::class, ModuleB::class ]¹ stored in a file called $hashOfModuleList . '.php' ².

When a service is requested, we would check this map and internally load additional declarations from modules if they're not present already. This would

This would sufficiently defer the loading of excessively large modules into memory without any new user-land code and it might even cover scenarios that would otherwise require an exceptionally steady hand in assembling/configuring modules & declarations.

¹ it's an array because a service can be overridden. Maybe we could simply use the last Module as a single string though. It's just a pencil sketch ² Example: md5(implode(array_map('get_class',$modules))) . '.php', but I did not consider connected packages because I am not very experienced with that feature. The dynamic name is because the list of modules can already be dynamic depending on the current request. Again, pencil sketch

overclokk commented 11 months ago

@gmazzap

The problem is that the services right now must be added to the ContainerConfigurator before "init". After that hook is called, the container configurator creates the container which is read-only. At that point no services can be added anymore.

Yes, I know this, I had the time to study the code for one of ours project.

To reinforce: all services are already "lazy". When you register a service, it is not instantiated at all. It is only instantiated when/if another service requires it. It is already "kind-of proxied". This issue aims at reducing the memory footprint because even if lazy, the factory callback must be stored in memory, just like the proxy would need to be created and stored in memory, so there's no way we can reduce memory footprint via proxy.

Ok, if you talk about only the registration part ok, I'm with you (I hate all that boilerplate stuff), if you do not have some autowiring mechanism you need to register all services before they are called.

Registering a factory callback or registering a proxy is not much different conceptually. If the read-only container would allow us to register factory callbacks after "init", yes, we could achieve this via a proxy.

Conceptually does not but tecnically yes, they are different beast.

$service = $c->get('SomeService'); // This give you the real service
$service instanceof 'SomeService';
...
$c->proxy('SomeService', fn() => do stuff);
$service = $c->get('SomeService'); // This give you the "fake" service
$service instanceof 'SomeService'; // Fafe service with correct Interface|whatever

// do some stuff in you application
$service->callRealMethod(); // At this point the real object is instantiated with all dependency trees.

The key point here is that the read-only container we use is very strict, and it forces us to register everything upfront.

Maybe this could also be solved with reflection?

The only case in which a Proxy would at that point be useful, is that a service instantiation is heavy,

Not only for that, becuse the nature of proxies, if you register one Proxy for admin_init and you are on FO, the object it will never been instantiated.

besides adding a 3rd party dependency to a package that is used pretty much in every other package in the company, and that does not sound like a great idea to me.

You don't need to add another dependency, you only need to add a "suggest" key in composer.json if you prefer.

At this point I propose also the Autowiring concept, with this you can save memory because you do not need to register every service in the application but only a few (for example services that need to be shared so only one instance is created.)

And you will say, "But that use the Reflection..." and I answer,yes, but the reflected obect could|should be cached in some way.

I use Auryn\Injector for Autowiring stuff with Proxy pattern (implemented by me) because how the WordPress events works.

Time ago I worked on a PR for implementing the Proxy pattern in Auryn\Injector but was not merged because there was no interest in that, it was merged in another forked package and also I kept the code and I use my own version for that reason.

overclokk commented 11 months ago

For today ID I made two experiment, one with a simple reflection for the ReadOnlyContainer and one with adding a container that also handle Autowiring and Proxies to the \Inpsyde\Modularity\Package().

I made Reflection only for fun, but if you have to add this just make the ReadOnlyContainer a MutableContainer, more simple solution (I know that with a more strict way developers do less damage 😅)

The other experiment was with an Autowiring container and the ProxyManager, for autowiring I use Auryn\Injector (my forked version).

Injector needs to be "configured" only for a few things, aliasing Interface to Concrete, Sharing if you need to share something and some less used things, anyway most of the configuration happens with string and not with factories at least if you need those in some scenario.

The simple implementation here explain what I did:

function injectorPreparation(): \ItalyStrap\Empress\Injector
{
    $injector = new \ItalyStrap\Empress\Injector();
    $injector->share($injector);

    $injector->alias(\ItalyStrap\Config\ConfigInterface::class, Config::class);
    $injector->share(Config::class);

    $injector->proxy(ExampleWithDependency::class, new ProxyFactory());

    return $injector;
}

/**
 * Provide the plugin instance.
 *
 * @link https://github.com/inpsyde/modularity#access-from-external
 */
function plugin(): Package
{
    static $package;
    if (!$package) {
        $injector = injectorPreparation();

        $properties = PluginProperties::new(__FILE__);
        $package = Package::new($properties, new Container(
            $injector,
            $injector->make(\ItalyStrap\Config\ConfigInterface::class)
        ));
    }

    return $package;
}

ExampleWithDependency::class is a concrete class so there is no need to aliases it and there is no need to "configure" its own dependency, untill they are concrete, if they are Interfaces an error will be thrown.

After that I created a module and inside the run method just:

        $example = $container->get(Concrete\ExampleWithDependency::class);
        \var_dump($example);

The dump value is not the real object but the proxy object and the dependency is never instantiated, only if you call $example->doSomething(); (or whatever method) also the dependency is instantiated, this means that if you do somethig like this \add_action('template_redirect', $example); (let say that the object is callable) all the tree of dependencies are instantiated only at 'template_redirect' not before.

BTW: The `services()' was empty, no need to configure factories here.

Now if you need more in depth examples or performance test let me know and I'll try to add those.

Biont commented 11 months ago

I still have not understood how proxies and autowiring are relevant to the issue presented in the OP.

Let's say your project contains aCliModule that is strictly only needed in wp-cli calls. But it contains two million service declarations. The concern is that parsing the service definitions of giant modules and keeping them in memory might be an issue. And that there could be room for optimization if it's known that a module is only needed if certain conditions are met.

Whether or not a module's services are autowired and/or return proxies is not relevant. Am I missing something?

Don't get me wrong, I have a fair bit of experience writing and consuming autowired containers and I've written a proxy library myself in order to transparently break circular dependency chains during autowiring. It's awesome! But it's not the point here.

overclokk commented 11 months ago

@Biont

Let's say your project contains aCliModule that is strictly only needed in wp-cli calls. But it contains two million service declarations

At this point I think you have another kind of problem.

Whether or not a module's services are autowired and/or return proxies is not relevant.

I don't get why is not relevant.

Give me some real code where I can do benchmark, just for fun.