How to handle large aggregates - defining multiple aggregate roots

webdevilopers / php-ddd

PHP Symfony Doctrine Domain-driven Design

202 stars 10 forks source link

How to handle large aggregates - defining multiple aggregate roots #19

Open webdevilopers opened 7 years ago

webdevilopers commented 7 years ago

We have a Contract Domain Model holding a huge collection of Timesheets. I just added some simple Symfony code snippets using Doctrine Entities and Annotations. All Entities are later moved to Acme\Contract\Domain\Model namespace and XML Mapping inside Acme\Contract\Infrastructure\Persistence\Doctrine etc..

use Acme\Contract\ContractBundle;

/** 
 * @ORM\Entity
 */
class Contract
{
    /**
     * @ORM\Id
     * @ORM\Column(type="integer")
     * @ORM\GeneratedValue(strategy="AUTO")
     */
    public $id;

    /**     
     * @ORM\Column(type="string", name="status")
     */
    protected $status;

    /**
     * @ORM\OneToMany(targetEntity="Timesheet", mappedBy="contract", cascade={"persist"})
     * @todo Use LAZY loading?
     */
    private $timesheets;

    public function addTimesheet(Timesheet $timesheet) {
        // Policy
        if ($this->isClosed()) {
            throw new ContractAlreadyClosedException();
        }

        $this->timesheets[] = $timesheet;
    }
}

use Acme\Contract\ContractBundle;

/**
 * @ORM\Entity
 */
class Timesheet
{
    /**
     * @ORM\Id
     * @ORM\Column(type="integer")
     * @ORM\GeneratedValue(strategy="AUTO")
     */
    private $id;

    /**
     * @ORM\ManyToOne(targetEntity="Contract", inversedBy="timesheets")
     * @ORM\JoinColumn(name="contract_id", referencedColumnName="id")
     */
    private $contract;
}

Currently we are using CQRS and a single Symfony app. Using Contract as the only AR forces our policy on the invariant:

use Acme\Contract\Application\Service;

class WriteTimeSheetHandler
{
    public function handle($command)
    {
        $contract = $this->contractRepository->find($command->contractId());
        $timesheet = new Timesheet($command->start(), $command->end());
        $contract->addTimesheet($timesheet); // Check invariant
    }
}

There can be many related timesheet aggregates. This gives us the hint that Timesheet itself should be handled as an Aggregate Root (AR):

http://stackoverflow.com/questions/37662541/ddd-do-i-really-need-to-load-all-objects-in-an-aggregate-performance-concerns Since we have to query timesheets it already has its own repository - so it is an AR too.

Another solution we thought about is creating a new Contract Domain Model inside a different Bounded Context where the Timesheet will be living in. This would lead us to Eventual Consistency but it would be possible with our current infrastructure where bounded contexts are shared across the app.

As suggested by @mgonzalezbaile:

use Acme\Contract\Application\Service;

class WriteTimesheetHandler
{
    public function handle($command)
    {
        $timesheet = new Timesheet($command->start(), $command->end());
        $this->timesheetRepository->save($timesheet);

        $this->raiseEvent(TimesheetCreated($contractId));
    }
}

class TimesheetCreatedListener
{
    public function __constuct($event)
    {
        $contract = $this->contractRepository->find($event->contractId());
        $timesheet = $this->timesheetRepository->find($event->timesheetId());
        $contract->addTimesheet($timesheet); // Check invariant
        $this->contractRepository->save($contract);
    }
}

But this way the Contract still has the relations to existing timesheets and all of them would be loaded.

Other thought on large aggregates by @voroninp:

http://programmers.stackexchange.com/questions/250073/how-to-work-with-large-aggregate-roots

Instead of loading all the timesheets we thought about removing them completely from the Contract. The only connection would be on the Timesheet aggregate via reference (ContractId):

class Timesheet
{
    private $contractId;

    public function __construct(Contract $contract, $start, $end)
    {
        // Policy - move to domain service using repository?
        if ($contract->isClosed()) {
            throw new ContractAlreadyClosedException();
        }

        $this->contractId = ContractId::create($contract->id()); // UUID value object soon
        $this->start = $start;
        $this->end = $end;
    }
}

Still we would load the Contract Aggregate and pass it to the Timesheet aggregate in order to check the invariant. But passing the Contract AR to the Timesheet constructor doesn't feel good.

Another variation is moving the policy out of the Timesheet into a Domain Service. This Domain Service CanOnlyAddTimesheetToPendingContract could receive the contract repository. I think this is absolutely legal in DDD. We would only have to pass the ContractId to the constructor instead of the full domain model.

Though I like this approach @VoiceOfUnreason states here for a similar example:

http://programmers.stackexchange.com/a/311295/155135

Which is to say, if the rules for modifying a Timesheet depend on the current state of the Employee entity, then Employee and Timesheet definitely need to be part of the same aggregate, and the aggregate as a whole is responsible for ensuring that the rules are followed.

An aggregate has one root entity; identifying it is part of the puzzle. If an Employee has more than one Timesheet, and they are both part of the same aggregate, then Timesheet is definitely not the root. Which means that the application cannot directly modify or dispatch commands to the timesheet - they need to be dispatched to the root object (presumably the Employee), which can delegate some of the responsibility.

https://github.com/webdevilopers/php-ddd/issues/1

webdevilopers commented 7 years ago

I took another look into "Practices of Domain-Driven Design" (@PPDDD) by @elbandit and @NTCoding. In chapter 19 p. 433 there is a concrete example showing how to work with IDs only and still have the other AR check the invariants.

The solution for my example should look like this:

use Acme\Contract\Application\Service;

class WriteTimeSheetHandler
{
    public function handle($command)
    {
        $contract = $this->contractRepository->find($command->contractId());
        $timesheet = $contract->writeTimesheet($command->start(), $command->end()); // Check invariant
        $this->timesheetRepository->save($timesheet);
    }
}

The only thing I wonder is how to enforce the invariants checked? Since a developer could still use the constructor of the Timesheet AR by simply passing the required contractId. Or is it enough to make the constructor private and use a static factory method on the Timesheet AR? Still a developer could create the aggregate without going through the invariant of Contract. As far as I understand the invariants have to be enforced for the "outside world" - the User and its Interface - "orchestrated" by the Application layer.

I wonder how to adapt these rules if a new AR is added e.g. an Employee who can only work for a certain time range and is - of course - not allowed to work twice a the same time.

Now we have two AR having to check invariants. I guess this is the right place for an Domain Service, right?

use Acme\Contract\Application\Service;

class WriteTimeSheetHandler
{
    public function handle($command)
    {
        $timesheet = $this->timesheetService->write(
            $command->contractId(),
            $command->employeeId(), 
            $command->start(),
            $command->end()
        );
        $this->timesheetRepository->save($timesheet);
    }
}

use Acme\Contract\Domain\Model;

class TimesheetService
{
    public function write($contractId, $employeeId, $start, $end)
    {
        // Get aggregate roots from injected repositories.
        // Check invariants with simple methods like `isClosed` etc.
        $timesheet = new Timesheet(...);
        return $timesheet;
    }
}

NTCoding commented 7 years ago

I think at this level Domain-Driven Design is quite flexible and is not prescriptive about any of those approaches.

Firstly, if the collection of timesheets on a contract is huge - enough to be a performance problem, that's an acceptable excuse to make the timesheets a look up in my opinion. And then by doing that, you could have a repository with specifi queries to only load a subset of the timesheets.

I also find the invariants you mention quite interesting. Basically, you can't create a timesheet if the contract has expired. In a single model, it seems quite appealing to put that on the contract. That way, you can't add the timesheet to a contract if the contract is not active.

But it does make me question the overall design: would timesheets be a separate bounded context. From the code I can see here, it seems like the system is a timesheet system and the contract is just a way to group timesheets. So it seems fine to me. Does the contract have any meaning without the timesheets?

I'm not that keen on the inspiration you got from our book :D I don't like the idea you can create the timesheet and easily avoid the invariants.

To be honest, I'd probably have a method on contract: contract.addTimeSheet(employeeId, start, end) to enforce those variants. If the system grew and a contract had many other associated concepts then timesheets looks like a candidate for a completey separate bounded context.

If it was a separate bounded context, then I think the invariant would live in the timesheet. The timesheet bounded context would RPC over to the contracts or store the contract status in the timesheets bounded context, and check the statuse before the timesheet was created.

Hopefully that makes sense.

webdevilopers commented 7 years ago

Thanks for your thoughts @NTCoding ! I totally agree on your idea of different bounded context!.

Currently the Contract is the CoreDomain. Not only does it implement time but also defect recording (inspection results of a quality management system). Technically we are yet not able to run separate apps and databases for the BCs. That's why we are using one big app.

The Contract Domain Model is a - maybe the - central point. Sometimes a Contract can run over months with up to 100 daily sheets and results. That's why we at least need repositories for custom querying. But maybe this can be done with pure SQL instead of using an ORM. But the letter is heavily implemented and works fine for creating our READ models so far. So we have repositories for Timesheets and that's what makes it an Aggregate Root - since it can be persisted by the repository, right?

At the bottom line the Contract holds the Status invariant and some more.

If we had Bounded Contexts would you add a new Contract Domain Model just holding the status? Being updated by an event when the Contract is closed in the Core Domain?

How would you enforce the invariant then? My first thought was a factory method passing the desired Contract:

final class Timesheet
{
    private $contractId;

    public static function addToContract($contract, $employee, $start, $end)
    {
        if ($contract->isClosed()) {
            throw new ContractAlreadyClosedException();
        }        

        $this->contractId = $contract->id();
        // ...
        return static(...); // using private constructor
    }
}

Using a private constructor will prevent creating a Timesheet without passing the invariants.

This version could be used with the BCs AND the current Shared Kernel app. Because we are passing the Contract itself - or at least the attributes of one Contract Model in that context.

What do you think?

NTCoding commented 7 years ago

It's hard to say to be honest. I get the slight feeling that the concept of a contract is just the ID that holds things together. E.g. the timesheets. Is it possible you can list the main aggregates in your system and their general responsibility?

webdevilopers commented 7 years ago

After my holidays @NTCoding , ok? ;)

What I can tell already is that a Contract can have Timesheets and InspectionResults. And there are also Timesheets that can be used for Employees only. So Timesheets could definitely belong to a TimeRecoding Bounded Context.

Just a final general question regarding my last example: Is it a common approach to pass an aggregate root (like Contract) to a factory method of another aggregate root to use it for protecting invariants for instance. Or should aggregate roots only be related by their IDs and never receive the full aggregate? Should one use a factory service then instead?

E.G.: AR Timesheet can only receive ContractId, Factory can receive full Contract object.

An example for the latter by @jbogard:

Going from a data-driven approach, this will cause pain. If we write our persistence tests first, we run into “why do I need a Customer just to test Order persistence?” Or, why do we need a Customer to test order totalling logic? The problem occurs further down the line when you start writing code against a domain model that doesn’t enforce its own invariants. If invariants are only satisfied through domain services, it becomes quite tricky to understand what a true “Order” is at any time. Should we always code assuming the Customer is there? Should we code it only when we “need” it?

If our entity always satisfies its invariants because its design doesn’t allow invariants to be violated, the violated invariants can never occur. We no longer need to even think about the possibility of a missing Customer, and can build our software under that enforced rule going forward. In practice, I’ve found this approach actually requires less code, as we don’t allow ourselves to get into ridiculous scenarios that we now have to think about going forward.

https://lostechies.com/jimmybogard/2010/02/24/strengthening-your-domain-aggregate-construction/

And by @dylan-smith:

If we come back to the original topic – aggregate boundaries – these come into play because it turns out it’s pretty straightforward to enforce invariants within an aggregate, but if you have an invariant that spans multiple aggregates, it becomes significantly harder.

There’s certainly techniques for enforcing invariants that cross aggregate boundaries, but it definitely adds complexity (more on this later).

http://geekswithblogs.net/Optikal/archive/2013/04/07/152643.aspx

http://geekswithblogs.net/Optikal/archive/2013/05/05/152869.aspx

So now we have two competing desires, larger aggregates give us flexibility for enforcing invariants, smaller aggregates give us less chance of concurrency exceptions. We have to make a tradeoff between these two properties. We have a little more flexibility than just size of the aggregate though, we can strategically choose how to place those boundaries. You can have to 2 similarly sized aggregates, that encompass different sets of entities; one of those aggregate boundaries may be better than the other.

What I try to do is choose aggregate boundaries such that most of my system’s invariants will not have to span aggregates, but also try to choose them such that there is a small likelihood that multiple users will be simultaneously updating the same aggregate. Ultimately this all comes down to examining your business domain, and expected usage patterns of your application in order to make the best decision here. Let’s look at a couple of examples.

http://geekswithblogs.net/Optikal/archive/2013/04/15/152692.aspx

webdevilopers commented 7 years ago

I've been studying the @PPPDDD Principles, Practices and Patterns of Domain-Driven Design book by @elbandit again @NTCoding.

And I think I found a / "the" solution!

Timesheets and InspectionResults belong to the BC of the Contract. Both can be regarded as Value Objects. But they grow in size which will cause performance problems. LAZY LOADING is more or less a warning sign that something is wrong with aggregates. In addition it makes no sense to load all these objects when simply wanting to change a single attribute on the Contract. This way Timesheets and InspectionResults become AR themselves. Still Contract holds invariants / policies to protect.

Coming back to my first example https://github.com/webdevilopers/php-ddd/issues/19#issue-177385368 I would suggest the following changes:

Remove the Timesheet associations from Contract.
Keep a factory method for creating Timesheets using the UL e.g. "writeTimesheet"
Protect the invariant inside this factory and then call the constructor of the Timesheet AR and add the ContractId.

use Acme\Contract\ContractBundle;

/** 
 * @ORM\Entity
 */
class Contract
{
    /**
     * @ORM\Id
     * @ORM\Column(type="integer")
     * @ORM\GeneratedValue(strategy="AUTO")
     */
    public $id;

    /**     
     * @ORM\Column(type="string", name="status")
     */
    protected $status;

    public function writeTimesheet(EmployeeId, $employeeId, $start, $end) {
        // Policy
        if ($this->isClosed()) {
            throw new ContractAlreadyClosedException();
        }

        // Use ORM to create ID as value object, otherwise use ContractId::create($this->id)

        return Timesheet::withContract($this->id, $employeeId, $start, $end);
    }
}

final class Timesheet
{
    private $contractId;

    private function __construct(ContractId $contractId, EmployeeId $employeeId, $start, $end)
    {
        // calling private setters
    }

    public static function withContract(ContractId $contractId, $employeeId, $start, $end)
    {
        return static(...); // using private constructor
    }
}

Now if I had to do additional checks e.g. on the Employee - is he allowed to work on the Contract - I would use a Factory class as Domain Service instead. Because aggregate creation is becoming too complex.

Final thoughts: What does "protecting invariants" really mean resp. who does it address? In my current example anybody could call the static withContract constructor without the policy "is Contract closed" being checked. But I think it is not about allowing a different developer in the same BC to create invalid objects. It is about implementing a right way inside my Application to ensure that it creates a consistent object: a Timesheet requires a ContractId to be persisted.

Or is there any way inside an application to ensure there is no wrong way to create the object by simply using the static ctor?

Maybe @VaughnVernon could share some of his thoughts about this? Thanks in advance!

In this context:

If we start to split up our domain model into smaller aggregates, it reduces the likelihood of concurrency exceptions happening. If we have each Customer as an Aggregate (containing the Orders), then you will only get concurrency exceptions if two users are trying to edit Orders for the same customer. If you make Customer and Order separate aggregates you only get concurrency exceptions if two users are trying to edit the same order at the same time.

So now we have two competing desires, larger aggregates give us flexibility for enforcing invariants, smaller aggregates give us less chance of concurrency exceptions. We have to make a tradeoff between these two properties. We have a little more flexibility than just size of the aggregate though, we can strategically choose how to place those boundaries. You can have to 2 similarly sized aggregates, that encompass different sets of entities; one of those aggregate boundaries may be better than the other.

What I try to do is choose aggregate boundaries such that most of my system’s invariants will not have to span aggregates, but also try to choose them such that there is a small likelihood that multiple users will be simultaneously updating the same aggregate. Ultimately this all comes down to examining your business domain, and expected usage patterns of your application in order to make the best decision here.

http://geekswithblogs.net/Optikal/archive/2013/04/15/152692.aspx

http://geekswithblogs.net/Optikal/archive/2013/05/05/152869.aspx

http://geekswithblogs.net/Optikal/archive/2013/04/07/152643.aspx

Regarding the "Customer / Orders Example" chapter I would go for #3 since many users in my system can add Timesheets to the same Contract simultaneously.

webdevilopers commented 7 years ago

Some days ago I posted a question about Policies (Specifications) and if they should be injected into Domain Models:

https://github.com/webdevilopers/php-ddd/issues/23

Today I found an old slideshow by @mathiasverraes:

http://verraes.net/2013/06/unbreakable-domain-models/

In the end he shows a SpecialOfferSender (Application) Service that has a Policy / Specification injected too.

This made me re-think my final solution I posted the last time:

https://github.com/webdevilopers/php-ddd/issues/19#issuecomment-256578853

Instead of protecting the invariant "Contract must not be closed" through a factory method of another AR I could inject a Policy into the related TimeSheet or Bundle AR itself:

final class Timesheet
{
    private $contractId;

    private function __construct(ContractId $contractId, EmployeeId $employeeId, $start, $end)
    {
        // calling private setters
    }

    public static function withContract(TimeSheetCanBeAdded $policy, ContractId $contractId, $employeeId, $start, $end)
    {
        if (!$policy->isSatisfied($contractId)) {
            throw new ContractAlreadyClosedException();
        }
        return static(...); // using private constructor
    }
}

I prefer passing the contractId only here instead of the full $contract object compared to the example by @mathiasverraes.

Though this comes wth a downside: I need the ContractRepository to fetch the Contract. Otherwise I could have used the Policy directly in my factory method without the need to inject it. But I prefer injecting the Policy instead of the full Contract and simply pass the ContractId.

The Policy itself could be a Domain or an Application Service. It depends on wheter the Contract comes from a different Bounded Context. In that case I could not use a repository but maybe a HTTP request to a REST API.

Currently I share the same BC so I can implement it this way:

final class ResultCanBeAdded
    implements ContractCanBeModified
{
    private $contractRepository;

    public function __construct(ContractRepository $contractRepository)
    {
        $this->contractRepository = $contractRepository;
    }

    public function isSatisfied(ContractId $contractId)
    {
        $contract = $this->contractRepository->contractOfId($contractId);

        return !$contract->isClosed();
    }
}

Later, if the invariants get even more complicated I could use a factory class. It is easier to inject all the policies there and mabye pass the full Contract objects. But currently I think this would lead to an anemic model.

What do you think of this approach?

yvoyer commented 7 years ago

@webdevilopers

I may add a suggestion on how to solve this. Maybe you could use an interface for your Timesheet that would be implemented by the Contract and Employye. From then the invariant for each context would be enforced when creating the TimeSheet, and the invariant would be isolated in its own namespace.

interface TimesheetPolicy {
    public function satisfiesTimesheetPolicy ();
}

Class Contract
{
    public function satisfiesTimesheetPolicy () {
        return ! $this->isClosed ();
    }
}

Timesheet {
    private function __construct(Timesheet Policy,  ...) {
        if (! $policy->satisfiesTimesheetPolicy ()) {
             // ex
        }
        $this->policy = $policy; // for persistence and fk
    }
}

This would also detach the Timesheet ar from yhe contract, since none would know about each other and you could use a AlwaysFalsePolicy for testing.

webdevilopers commented 7 years ago

Nice suggestion @yvoyer ! This would indeed work as long as I keep the related models in the same BC.

yvoyer commented 7 years ago

I'm unsure how to handle interfaces that could cross BC. Whether it is a bad practice or its a soft dependency.

IMO, I would prefer the AR to use the interface from another BC than putting them in the same context when both entity should have stayed in their own context.

webdevilopers commented 7 years ago

Yesterday @prolic suggested the following:

inject a read model in your command handler or even inject the event sourced collection to make the check in the handler before calling the method on the AR Other than that, I am always sceptical when people tell me it's definetely an aggregate root and not a value object From your description above timesheet seems to be a value object

But I'm sceptical since IMO protecting an invariant should not be done by an Application Service like the Command Handler.

I do get the point that Time Sheets look like a VO. But we definitely have to do some reporting queries on the aggregate. It also has to protect it's own invariants. So me made it an AR with its own event store. Especially since a collection of Time Sheets for a Contract could become very large!

But @prolic still makes me think!

What if I added Time Sheet to the Contract AR. The Contract itself has about 15 Events. The Time Sheet would add another 5. So there are a lot of Events for a single event stream. Is this "normal" for an event sourced AR?

prolic commented 7 years ago

20 events is not much :)

webdevilopers commented 7 years ago

I get your point from a Event Sourcing POV @prolic!

If I need reporting queries I create a read model projectiont.

But when I load the Contract AR from the Event Store and it has hundreds of Time Sheets Events to apply in order to fill the Time Sheets collection... - sure I could use snapshots.

But do you really think Time Sheets should not be in their own Stream? Since there is only 1 invariant to protect from a different AR? Everything else is handled inside the Time Sheet AR.

webdevilopers commented 7 years ago

BTW just saw this code snippet from https://vimeo.com/43598193 by @jbogard:

offer_this

If one AR is responsible for creating the other it looks like it is okay to pass the protecting AR / Entity. So I would pass the Contract to the TimeSheet (VO) that could then protect the invariant.

Of course this only works as long as my Time Sheet lives inside the same BC. Otherwise I would have to deal with eventual consistency. Talk to Contract BC and eventually recejt the Time Sheet if the Contract was closed.

msphn commented 4 years ago

https://dddcommunity.org/wp-content/uploads/files/pdf_articles/Vernon_2011_1.pdf

This article helped me a lot. I was facing a similar problem.

webdevilopers commented 4 years ago

Thank you @msphn ! Of course I am familiar with the articles by @VaughnVernon. This is an older question and we managed to implement a good solution with small aggregates living in their own bounded contexts. And via events and process managers we try to ensure "higher level validation" which then will result in more events e.g. "accepted" or "declined".

This makes the implicit explicit and understandable for our developer team.