Closed alexander-schranz closed 9 months ago
that is some really impressive improvement, thanks for the analysis and the implementation.
Thx for the reviews.
@dbu we will next week do some more profiling on some projects and then I will continue here with some fixes. Pushed this already so we can test it on various projects via:
"repositories": [
{
"type": "vcs",
"url": "git@github.com:alexander-schranz/jackalope-doctrine-dbal.git"
}
],
"require": {
"jackalope/jackalope-doctrine-dbal": "dev-feature/improve-delete-properties-performance as 1.10.0",
}
i am about to tag the next beta for jackalope-doctrine-dbal 2.0
i think there is nothing preventing us from doing 2.1 very quickly afterwards, once this is ready. but that way we at least should soon have a stable release that installs with symfony 7 - and this change has no BC breaks, it is an optimization that can come as minor version. (unless you see something that you would want to BC break to make things easier - if we need that tell me by monday so we can look into it before releasing 2.0.0 stable.)
@dbu I will do again a test run against sulu jackolope 2 tomorrow. To release this as new minor for 1 and/or 2 I also see no problem as it is just changing internal logic.
I did do some adjustements so escaping the property name and value behave the same as before and extend the test case so we are 100% sure the XML special chars are escaped the same way as before.
Applied the suggestions to the pull request, so it is ready now :) Also did again more testing with special chars and the result is 100% the same for old and new implementations:
released in 1.11.0
See also: https://github.com/jackalope/jackalope-doctrine-dbal/pull/423
Prototype Repository: https://github.com/alexander-schranz/jackalope-xml-delete-properties
Jackalope Doctrine DBAL Analyse deletion of properties
The benchmark requires 2 files:
var/props.xml
- the xml of the node where we want to remove properties (SELECT props FROM phpcr_nodes WHERE identifier = ?)var/props.csv
- the props names which we want remove from the given xml (per line one property name)Different commands:
Results
~70000 properties (~12.5MB) remove ~1700 props
Run on a MacBook Pro (16", 2021) Apple M1 Pro 32 GB:
legacy
: is the1.9.0
version: https://github.com/jackalope/jackalope-doctrine-dbal/blob/f7b286f388e0d3a42497c29e597756d6e346fea5/src/Jackalope/Transport/DoctrineDBAL/Client.php#L1804single_dom_document
: should represent the state of2.0.0-beta2
version after: https://github.com/jackalope/jackalope-doctrine-dbal/pull/423/filesBlackfire
I did not use Blackfire for benchmarking as it did show in past benchmark where
xml_parse
can not be good profiled as having a lot of callback method being called viaxml_set_element_handler
andxml_set_character_data_handler
. So profiling takes more time as processing things as Blackfire need to log every method call. Instead I depend on classic benchmarking via time() and memory_get_peak_usage(true) measures.Required changes for improvements
A: Group Properties
The most important thing is that we remove all properties at once instead of calling
saveXML
after each property removal.For this we mostly would require first group all
deleteProperties
by itsnode
:Then we load the single
node
remove all the properties and save the xml once viasaveXML
.B: Grouped Reference delete queries
The
queries
to remove references should also be grouped and best a single query be send to delete the references instead of one query per reference.The queries are currently ignored in the benchmark as it is focused on XML manipulation.
C: Replace DOMDocument with xml_parse
DOMDocument is bad for performance and should be avoided. The
xml_parse
as it allows us to streamed reading the xml and skip the properties which we want to remove. TheXmlPropsRemover
is an example how this could be done.D: TODO
currently there is a little difference in the 2 printed xmls:Update
xml_parse
variant now has the same output as the previous DOMDocument version:TODO: