jackalope / jackalope-doctrine-dbal

Doctrine DBAL transport implementation for Jackalope
http://jackalope.github.io
Other
143 stars 60 forks source link

Improving `deleteProperties` performance #423

Closed mamazu closed 1 year ago

mamazu commented 1 year ago

I found another improvement in the deleteProperties function. In the old code it would run a node id query for every path you want to delete. Now, it groups the properties by node and then queries the database once for an id.

mamazu commented 1 year ago

The pipeline is failing for code I didn't touch. Should I fix that as well?

dbu commented 1 year ago

if you have some time to spare to fix the phpstan and cs fixer complaints i would of course appreciate that ;-)

alexander-schranz commented 10 months ago

@dbu, @mamazu, @chirimoya I'm doing some tests about improving deleting properties. As in the previous performance improvements the best thing is completely avoid \DOMDocument. I did a quick benchmark and the best solution as always when working with PHP and XML go the workaround over xml_parse.

The Benchmark results and also different implementation of DOMDocument are available (2.0 vs 1.9) and some little improvements on query.

https://github.com/alexander-schranz/jackalope-xml-delete-properties

dbu commented 10 months ago

impressive! we do have quite some test coverage, though edge cases of strange node names and such is probably not extensively covered, if such things become tricky with xml_parse.

should we get this in for 2.0 ?

it sounds like a big change, so i am a bit wary of doing it as 2.1. we could also support both and have an option to enable the xml_parse mode but i would prefer if we can be confident with the new mode and not have to support both - the code is complex enough as it is :-)

alexander-schranz commented 10 months ago

I personally even would do it for 1.x. We will sure do some more testing, but currently I did implement the deletion on real world data homepage with about 40 different languages (including chinese) and 70000 properties so a very complex phpcr node and the resulted xml is 1to1 the same as the current implementation. What I did not yet test is special chars in props names, but if I remember correctly CR is there already very strict and only allow specific chars as property names.

We did already similar things when patching reading the data in 1.7.0: https://github.com/jackalope/jackalope-doctrine-dbal/commit/68392374fc4c4ea5d2380baf486e11a8ca700caa