neos-bot / flow-development-collection

The unified repository containing the Flow core packages, used for Flow development.
https://flow.neos.io/
MIT License
0 stars 0 forks source link

Persistence: object tree validation performance is really slow #20

Open neos-bot opened 10 years ago

neos-bot commented 10 years ago

Jira issue originally created by user jpaardekooper:

Our application has some serious problems with the performance of persisting data of large data sets. I'll try to explain it as clear as possible with a basic scenario.

Say our application has users that can be linked to one or more companies. A company is fairly large object with alot of OneToMany relations, for example Cars, Houses, Pets etc. Alot of stuff. Most of these objects have their own repository as well.

We have a UI that lets us create a Car and link it to a company. The performance of this is fine when the current user isn't link to alot of Companies that don't have alot of Cars, Houses, Pets etc. However, as soon as the data set becomes larger (say, 50 companies each with 25 Cars, Houses and Pets), it takes about 30 seconds to insert one simple Car.

The basic object model:

class Company {

    /****
     * @var string
     */
    protected $name;

    /****
     * @var \Doctrine\Common\Collections\ArrayCollection<Car>
     * @ORM\OneToMany(mappedBy="company")
     * @Flow\Lazy
     */
    protected $cars;

    // alot more fields
}

class Car {

    /****
     * @var string
     */
    $licensePlate

    /****
     * @var Company
     * @ORM\ManyToOne(inversedBy="cars")
     */
    protected $company;

    // some more fields
}

I've tracked the performance down to the TYPO3.Flow/Classes/TYPO3/Flow/Persistence/Doctrine/PersistenceManager.php file, where the Car object is triggered for validation (line 82). It then goes into the validateObject method and validates the Car. But when it validates the Car, it will also validate the Company and the Company will also validate all the Houses, Pets, etc.

A temporary solution is to add the @Flow\IgnoreValidation annotation to the $company property in Car. I don't think that's a very flexible and nice solution and wonder if it could be done better.

Jira-URL: https://jira.neos.io/browse/FLOW-17

neos-bot commented 10 years ago

Comment created by @albe:

I'd suggest solving that by providing @Flow\ValidationGroups annotation for entities. That way, one could annotate the company as @Flow\ValidationGroups("{Controller}") so it is only validated inside controller actions, but not during persistence any more. Of course that completely bypasses persistence validation for that entity, and possibility of invalid data inside the persistence arises when properties are changed inside the controller. That is a business decision though.

A second step could be to do peristence validation only within the bounded contexts, ie only up to aggregate roots. This would match default persistence cascading behaviour, but would also be harder to implement, as we'd have to copy some of the doctrine logic into the object validation (possibly including cascade annotations).

neos-bot commented 10 years ago

Comment created by @albe:

For record keeping, here is a post from the mailing list which relates to this: {quote}First of all, let's take the example of submitting a form to create a new instance of a model. In my case, I have models with complex relationships (an Asset is associated with a Version of a Software, which is associated with a Software, which has Modules, etc.). When I submit a form to create a new instance of an Asset, the validation framework will validate ALL members of my model recursively. So it will validate my Asset fields (name, date, author, etc.) as well as the associations it has to other models, even though those models were not created by the form (Version, Software, Modules). This creates unnecessary queries to the database (to load the other models to validate) as well as wasted CPU cycles and thus the request takes more time to complete. [..]

A smart validation stack would only validate the model's members which were submitted with the form, and not the other associated models (which are only referenced by their unique identifier).

The same goes when the PersistenceManager validates all scheduled insertions/updates to the database: it will blindly validate all members of the models, regardless of the changesets that will be applied in the database. There is no need to re-validate the model all over again when only certain fields have changed. {quote}

The important parts are bolded and suggest the perfect validation logic: Controller validation only validates submitted properties and persistence validation only changeset records. Both cases are hard to detect at validation time though, so this is not trivial to implement.

neos-bot commented 10 years ago

Comment created by @albe:

Some ideas on how to implement the optimized validation:

For property mapping:

For persistence:

neos-bot commented 9 years ago

Comment created by @bwaidelich:

Thanks a lot for the valuable input. IMO this is a major issue for large applications and we should look into it asap!

neos-bot commented 9 years ago

Comment created by @bwaidelich:

FYI: some comments on this one from the #typo3-flow channel: {quote} Kay Strobach [9:29 AM] @bwaidelich: had this as well, i used the no validate approach until now, but a general fix would be really nice :wink:

Christian Müller [9:46 AM] Should be possible to find changed/new entities via the Unit Of Work and just validate them

Christian Müller [9:48 AM] But the question is, what about object validators? I think there should be an option to enable full validation even on unchanged objects, could be that an object validator on a unchanged object fails because it validates some state of related (changed) objects...

Kay Strobach [10:01 AM] yes, but this can be done with an explicit anotation imho ... {quote}

neos-bot commented 9 years ago

Comment created by @albe:

+1 for explicit annotation to force full validation. Should the Object Validator be annotated or the entity property/parameter that should be validated? Possibly the latter makes more sense, as the first would maybe just lead to a number of validators in the wild that force validation and hence bypass this optimization.

neos-bot commented 9 years ago

Comment created by @bwaidelich:

@all thanks for your comments! This morning we discussed with the team and we all think that this would be a very valuable improvement.. Because it changes the behavior it can only be a major breaking change of course so Flow 3.0 (feature freeze on 15th February 2015!) would be the last chance to address this for a long time.. I think the UnitOfWork::getEntityChangeSet($entity) approach is worth a try. I'll happily test, review & discuss this issue but I probably won't have the time to tackle this myself soon.. Anyone feels like making an attempt?

neos-bot commented 9 years ago

Comment created by @albe:

Uff, so soon the feature freeze for 3.0? Not much time left... I'll try to start something when I find time the next weekend, but don't count on me finishing this up. Anyone finding time to also push a few lines is welcome

neos-bot commented 9 years ago

Comment created by @bwaidelich:

Just a little reminder (for all major candidates for Flow 3.0): Feature Freeze is in 10 days! Let me know if you need any help with this.

neos-bot commented 9 years ago

Comment created by @albe:

[~bwaidelich] Please take a short look at my first draft commit. It shows how I imagined implementing the optimization. I think for the persistence case, this is pretty straight-forward and easy. For the controller arguments this is more tricky and also kind of hacky. I'd love to have some feedback there, or someone else jumping in on this and commiting improvements.

neos-bot commented 9 years ago

Comment created by @bwaidelich:

[~aberl] thanks for this! I agree, the persistence case is nicely covered with your change. But there got to be a better way to do that for the controller.. maybe use a special validator..?

neos-bot commented 9 years ago

Comment created by @albe:

I agree, the persistence case is nicely covered with your change. Unfortunately, I found that this is not the case. The problem lies deeper than that.

But there got to be a better way to do that for the controller Yeah, see also my comments in gerrit. I'd try to build a changed entities list inside the propertyMapper and then use that to build an optimal validator chain. However, therefore the process must be changed, as currently the validator chain is built before actual property mapping. Other option would be to pass the changelist around inside the ObjectValidators.

PS: If someone can jump onto my last commit, that would be good, because I might not be able to do anything more before the weekend.

neos-bot commented 8 years ago

Comment created by @albe:

FLOW-436 shows that with latest Doctrine 2.4 changes (onFlush event being invoked every time) this problem becomes even more important to get a proper solution.

I'll try to find some more time to dig into this soonish... there are currently two aproaches to this - syncing validation to actual entity changes (optimal but more complex) or have validation boundaries matching the aggregate (DDD), so validation will not get such a deep hierarchy in most scenarios.

neos-bot commented 8 years ago

Comment created by @bwaidelich:

[~aberl] Did you already find the time to look into this again? I just had that case in a customer project where a GET request took several seconds because a simple DTO (that had relations to persisted entities) was validated recursively o.O (skipping validation for safe requests in general is obviously a different story, but first we should avoid validating unchanged persisted objects..!)

neos-bot commented 8 years ago

Comment created by @albe:

I'll try to find time sometime next week/weekend, as we're having some project deadline this weekend.

neos-bot commented 8 years ago

Comment created by @albe:

My big unfinished attempt at "optimal" validation is now here: https://github.com/neos/flow-development-collection/pull/335

Also, a simple alternative solution considering Aggregate Validation Boundaries is here: https://github.com/neos/flow-development-collection/pull/334

neos-bot commented 8 years ago

Comment created by @kitsunet:

This optimizes validation performance for well designed Aggregates

neos-bot commented 8 years ago

Comment created by @albe:

That last comment by Christian should probably have been done by the commit-bot instead. Also, "this" is meant to reference https://github.com/neos/flow-development-collection/commit/369d60d286b65516b302d45562519cf58ec6d553

neos-bot commented 8 years ago

Comment created by @albe:

Regarding fixing the "issue" for lower versions, how should we go about this? Is this still a concern? I could imagine introducing a simple change that allows for manually overriding the GenericObjectValidator that is used (e.g. through Objects.yaml) and then providing documentation on how to implement an own AggregateBoundaryValidator.