jackalope / jackalope-doctrine-dbal

Doctrine DBAL transport implementation for Jackalope
http://jackalope.github.io
Other
143 stars 60 forks source link

Taking advantage of batching property deletes when updating nodes #421

Closed mamazu closed 1 year ago

mamazu commented 1 year ago

Why

Currently the implementation for deleting a list of properties is to iterate over them and treat every delete of a property separately. This is a real determent to the performance of PHPCR, especially with bigger nodes.

Example: I want do delete three properties from the same node in the database. For every property we want to delete this is what it does:

Which in turn means if you want to delete two properties from the same node, then you have to query the database for the node twice and also parse the document twice.

The solution

Instead of defining the batch delete as a list of single deletes why not have the single delete make a batch delete with one element.

This pull request extracts out the logic for deleting a property from a node and extends it that you can delete a list of properties on one node as well.

To illustrate this with the same example: This would be the steps the program takes:

This massively reduces the amount of parsing the document and the amount of reads and writes to the database.

Drawbacks

The new "batching functionality" first creates a map of properties and nodes to delete. Creating this map takes more memory than the brute force approach before.

Others

Maybe we could also backport this speed up the 1.x branch so that project which are currently using the jackalope doctrine adapter also get the speed up.

mamazu commented 1 year ago

@dbu Could you have a look at this? This could be a big performance improvement for everyone trying to delete properties.

dbu commented 1 year ago

it seems that phpstan sees some unhandled edge cases. can you please cover those? and run the cs fixer?

mamazu commented 1 year ago

I fixed the errors phpstan reported, they were actually bugs. But now it should be clean maybe cs is still broken 'cause I tried to fix it manually.

mamazu commented 1 year ago

Furthermore: Should I rebase this PR to 1.x and then upmerge it? Because I think projects like sulu/sulu could really benefit from that right now.

dbu commented 1 year ago

Furthermore: Should I rebase this PR to 1.x and then upmerge it? Because I think projects like sulu/sulu could really benefit from that right now.

is deleting several properties a pattern that comes up a lot in sulu? i would honestly prefer to have the change in the new major version, to favor stability over performance improvements - there is the off-chance that there is some side effect in this refactoring. and 2.0 should be released soon :tm: i already tagged the first beta release of jackalope/jackalope

dbu commented 1 year ago

looks good to me :+1:

can you please fix the remaining CS complaints and add an entry in the changelog for 2.0?

dbu commented 1 year ago

thanks a lot!

mamazu commented 1 year ago

is deleting several properties a pattern that comes up a lot in Sulu?

It doesn't occur at all in Sulu yet. I could speculate that the previously bad performance might be a reason, but I am working on commands to clean the phpcr repository in Sulu and there a well performing delete is necessary for this.

But I guess 2.0 could be also a good idea in case any changes are BC breaks.

dbu commented 1 year ago

there should be no BC breaks, but it is a bigger change in the logic that i feel more comfortable as releasing with 2.0.0. consumers are expected to test new major versions more thoroughly than minor releases.