Factories, Registries, Containers, Service Locators, Dependency Injection...

highfalutin commented 2 years ago

I've been trying to figure out the best way to handle object creation / application component assembly in this project. Examples:

invoking the right Person Syncer (there may be multiple choices available; the client wants to use a particular one)
creating a Remote Person object with a reference to the right Remote System

There are a number of approaches we can take (and have been taking). They fall along a spectrum:	manual object creation; class names hard-coded; dependencies manually injected (hardwired)	↔️	objects and services fetched from an independent provider, perhaps by a string key; provider configured by client; dependencies automagically injected at runtime
pros: easy to read; dependencies/collaborators obvious at coding time; simple to test; autocompletion and interface-compliance checks in IDE cons: tight coupling; application basically monolithic; hard to extend; maybe hard to maintain (brittle)	↔️	pros: loosely coupled; very extensible; flexibly configurable even during runtime cons: more complex (overengineered?); dependency/collaborator relationships obscured; maybe harder to read; possibly harder to test; less coding help from IDE

The question I'm grappling with is: where do we want to settle on this spectrum? What's the sweet spot for this project between elegant simplicity/brittleness and powerful flexibility/complexity?

In its current state, the application does different things in different places:

some objects are created manually with hardwired collaborators
a "Factory" (more of an abstract factory) is used in a couple places; it is called statically so it's global; it creates new objects based on string keys; it contains a hard-coded default class registry which can be overridden
some objects pass around class names and use them to dynamically create objects or call static methods

I've been doing a bunch of reading as I try to think through this:

Inversion of Control Containers and the Dependency Injection pattern: https://martinfowler.com/articles/injection.html
Abstract Factory refresher: https://refactoring.guru/design-patterns/abstract-factory
Dependency Injection with Code Examples: https://stackify.com/dependency-injection/
Service Locator Pattern with Code Examples: https://stackify.com/service-locator-pattern/
OOP: The courier anti-pattern: https://r.je/oop-courier-anti-pattern
Dependencies in Disguise: https://thephp.cc/articles/dependencies-in-disguise
In Defense of Service Locator: http://bayou.io/draft/In_Defense_of_Service_Locator.html

What I'm thinking at this moment is that we can still use two approaches: hardwire the parts of the system that seem unlikely to need customization, and use a combination of simple DI/service locator for the parts that need to be customized.

Identifying exactly which classes are most likely to be customized would be a good next step.

artfulrobot commented 2 years ago

I've not read all those, I'd love to one day!

My guilty secret is that I think a lot of these systems over-promise and that in real life, when there's a new/different system that needs to be implemented it's likely to require changes anyway. Like we employ them because we have the foresight to know that in future something will need to change, but unfortunately with hindsight we realise that the thing that changed made the decision about how to provide future flexibility was wrong.

e.g. I love that Civi has api/v3/ directory, because obviously we'll put api4 stuff in api/v4/, right? ...

I think Drupal 8+ went full-on into the DI paradigm. I think the result has been pretty ugly, I abandoned one project after 6 months development! Every new release brought API changes (because we can't foresee what changes will be needed try as we may) and maintaining it became such a cost. Meanwhile, as you point out, developing for it without the IDE being able to comprehend what anything does is a nightmare.

To do anything you need to define it as an abstract contract, implement it in a concrete class, create a service provider class, register all this in a container. Things become so atomised that you can't reason about the bigger business logic. By the end of it it's very hard to locate the actual do-ing code.

The amount of indirections in DI may make application hard to understand. Indirection means code is split and scattered in different places; more indirection makes it more difficult to follow application structure and behavior. (from In defense of service locator)

I think one of the reasons is that the interfaces / contracts change because people choose different products because they offer different features, and they probably do that with different metaphors and structures. In the end, a new cool feature is going to require a tightly coupled set of classes, at least, and possibly some different top-level logic.

Say a competitor/forked collaborator(...) called Doing Community (DC) makes a cool new feature that adds multiple line items to donation pages. Someone using that would need DC specific remote object, mapper, matcher, syncer, and local object classes. All these would be relying on features provided by the others that don't exist in the AN implementation. So these new classes would be tightly coupled and this would need to be configured in the service locator/factory class/some config file. Furthermore, it might require that line items be sync-ed prior to a donation sync, which means a change to the main sync cron jobs.

By the time you've (a) replaced all that code and (b) replaced all the service definitions to use the new code, you're left with loads of classes and it's hard to tell which one is actually being used.

I think refactoring is good when it happens naturally, e.g. oh, I have implemented X and now I need Y. But knowing X well, I can now see the similarities with Y and now I'm in a position to determine how to make the system accommodate X and Y. i.e. the real world concrete different situations are so valuable in informing how to make something flexible. It's hard at the start of something. I feel very much at the start of this project, grappling with a lot of the concepts and it's quite a challenge for me to conceive of more abstraction; but you're more familiar so will be in a better place to do that.

Civi uses Symfony, would it make sense to use that?

True to its form of never quite deciding on something, Civi also involves a combination of DI of the container isntance and static access to it via a facade, Civi::container(). Injecting the container is presumably good because you can have 2 containers :exploding_head: But I've read previously that it's better to inject a service locator, which is basically something that specifies a pre-set list of services (likely from the container), instead of supplying the whole container, since then dependencies are clearer.

At the moment we apply the hard wiring in the static Factory class and the cron job script (which, for example, sets the policy of syncing from remote first - something that might not always be best, depending on lots of factors).

If one civi had to sync some contacts to one system and some to another, a statically bound DI (where the factories are decided at runtime) won't really work, because the context may require different factories. But you could conceive of a runtime services locator being created and passed down. e.g. "I'm the cron job to sync with AN, I will create a SL that builds ActionNetwork\PersonBasic etc. classes and pass that in" (like the remote system is passed in) and "I'm the cron that syncs with DC, I will create a SL that builds DoingCommunity\... objects and pass that down."

What I'm thinking at this moment is that we can still use two approaches: hardwire the parts of the system that seem unlikely to need customization, and use a combination of simple DI/service locator for the parts that need to be customized.

Which parts seem unlikely to need customisation?

Worth remembering that hooks(/event dispatchers and listeners) can also play a useful role.

Bit of a ramble, hope there's something useful in it!

highfalutin commented 2 years ago

in real life, when there's a new/different system that needs to be implemented it's likely to require changes anyway.

Very true. For the most part I'm trying to start with the simplest/dumbest solution to any given problem, and only making it fancier/more complicated when a real need arises. Because if we try to anticipate all the directions the software might go, we'll go crazy and probably get it wrong anyway, as you say. As I say I'm trying.

I think there's a real and present need for some classes not to be hard-wired together in this software, so that's why I'm wading into the bog.

I think Drupal 8+ went full-on into the DI paradigm... To do anything you need to define it as an abstract contract, implement it in a concrete class, create a service provider class, register all this in a container. Things become so atomised that you can't reason about the bigger business logic. By the end of it it's very hard to locate the actual do-ing code.

I would like to avoid this! I have the same complaint about Civi's usage of Symfony DI Containers (through Civi::service() etc). It takes significantly longer to follow code that uses that mechanism.

In the end, a new cool feature is going to require a tightly coupled set of classes, at least, and possibly some different top-level logic.

You give an example of a new remote system that offers different features, requiring a tightly coupled set of object/mapper/matcher/syncer classes, and probably more. Maybe this example supports officially dropping support for alternative remote systems -- see #15.

A much more likely scenario is that a user has a slightly-custom data model in Civi (for example, they have an "Action Network" email location type, and that's what they want to sync with AN instead of primary email), and they need a custom LocalObject and a custom Mapper to handle them. The LocalObject and Mapper will be tightly coupled with each other, but they follow the contracts that we've established between them and Syncers etc. This is something we should support.

I think refactoring is good when it happens naturally, e.g. oh, I have implemented X and now I need Y. But knowing X well, I can now see the similarities with Y and now I'm in a position to determine how to make the system accommodate X and Y. i.e. the real world concrete different situations are so valuable in informing how to make something flexible. It's hard at the start of something. I feel very much at the start of this project, grappling with a lot of the concepts and it's quite a challenge for me to conceive of more abstraction; but you're more familiar so will be in a better place to do that.

Totally. I think this is pretty much what I was saying above about starting simple & dumb, and then waiting for concrete needs to guide me. As for you being at the beginning of your understanding of this project, that's fair enough! Your perspective and experience from previous projects are valuable.

Civi uses Symfony, would it make sense to use that?

See above re: no. Super-overkill for this project. I also find its syntax to be bulky and unclear.

Injecting the container is presumably good because you can have 2 containers 🤯

And as mind-blowing as that seems, I've started to think I might need it (but probably not). We have a user who maintains multiple Action Network accounts and wants keep them all in sync with Civi and thus with each other. Using the tools I've already built, this would mean creating multiple "Sync Profiles" (configs), each with their own Action Network API key, and running syncs between Civi and each one of them in some kind of rotation. If the syncs are all running in the same thread, one way to handle the different configs would be to have a separate container for each one. Even if they're running in sequence within the same thread, not in parallel, we'd need to swap out the container or change it for each one.

Oops, I see you discuss this kind of thing a couple paragraphs down.

But I've read previously that it's better to inject a service locator, which is basically something that specifies a pre-set list of services (likely from the container), instead of supplying the whole container, since then dependencies are clearer.

See below.

At the moment we apply the hard wiring in the static Factory class and the cron job script (which, for example, sets the policy of syncing from remote first - something that might not always be best, depending on lots of factors).

I'd like to remove the hard-wiring from the cron job script. Instead, it should read user-specific config (which may live in a file or in the DB) and use that to run the application.

If one civi had to sync some contacts to one system and some to another, a statically bound DI (where the factories are decided at runtime) won't really work, because the context may require different factories. But you could conceive of a runtime services locator being created and passed down...

What I'm thinking at this moment is that we can still use two approaches: hardwire the parts of the system that seem unlikely to need customization, and use a combination of simple DI/service locator for the parts that need to be customized.

To elaborate: I'm proposing:

Move away from statically bound DI/SL/abstract factory.
Rename "Factory" to "Container".
In addition to creating object instances like Person LocalObjects, the Container will also act as a service locator for things like Person SingleSyncer (which I'm calling a service because it's singleton-per-container)
The Container could also do some object-tree setup, e.g. injecting RemoteSystem into Remote Person objects as it creates them. The Remote Object example may actually be the only place where this kind of automagic DI is needed.
The Container will contain, within itself, the default class mapping, as the Factory does now. Hopefully this makes dependencies a little bit clearer. However the defaults will be able to be overridden and added to.

Which parts seem unlikely to need customisation?

See https://github.com/lemniscus/osdi-client/wiki/Extensibility

Worth remembering that hooks(/event dispatchers and listeners) can also play a useful role.

Yes. These are indirections as well, but you're right they're useful ones.

artfulrobot commented 2 years ago

Re https://github.com/lemniscus/osdi-client/wiki/Extensibility there's https://github.com/lemniscus/osdi-client/issues/14 in which we're prescribing a very opinionated way to sync remote 'fundraising page title' and something similar from Civi's end. So even in our preliminary use-case, I think the Contribution objects remote and local will probably go in the likely-to-be customised/extended. In fact possibly more customised than extended; it's not that there's a clear suits-everybody way to sync the minimum data.

Move away from statically bound DI/SL/abstract factory.

Yes, definitely agree.

Rename "Factory" to "Container". In addition to creating object instances like Person LocalObjects, the Container will also act as a service locator for things like Person SingleSyncer (which I'm calling a service because it's singleton-per-container) The Container could also do some object-tree setup, e.g. injecting RemoteSystem into Remote Person objects as it creates them. The Remote Object example may actually be the only place where this kind of automagic DI is needed. The Container will contain, within itself, the default class mapping, as the Factory does now. Hopefully this makes dependencies a little bit clearer.

So like this:

flowchart TD
S1(["Start to process sync profile #1"]) --> S2
S2["Create a <em>container</em> with<br>config data from sync profile<br>specifying which classes get used"]
S2-->S3["Run the sync, passing in the container"]
S3-->S4["Code calls factory methods and<br>accesses singletons on given container"]
S4-.->|repeat for other profiles|S1
S4-->END([End])

However the defaults will be able to be overridden and added to.

Yes, sounds sensible.

lemniscus / osdi-client

Factories, Registries, Containers, Service Locators, Dependency Injection... #12