jackalope / jackalope-doctrine-dbal

Doctrine DBAL transport implementation for Jackalope
http://jackalope.github.io
Other
143 stars 60 forks source link

Improve delete properties performance by replace DOMDocument with xml_parse #432

Closed alexander-schranz closed 9 months ago

alexander-schranz commented 10 months ago

See also: https://github.com/jackalope/jackalope-doctrine-dbal/pull/423


Prototype Repository: https://github.com/alexander-schranz/jackalope-xml-delete-properties

Jackalope Doctrine DBAL Analyse deletion of properties

The benchmark requires 2 files:

Different commands:

php src/remover.php legacy
php src/remover.php single_dom_document
php src/remover.php single_dom_query
php src/remover.php single_dom_query_chunk
php src/remover.php xml_parse

Results

~70000 properties (~12.5MB) remove ~1700 props

Run on a MacBook Pro (16", 2021) Apple M1 Pro 32 GB:

Command Time Memory
legacy 6m 29s 48 MB
single_dom_document 2m 25s 35 MB
single_dom_query 1m 30s 37 MB
single_dom_query_chunk 1m 29s 35 MB
xml_parse ~1s 35 MB

legacy: is the 1.9.0 version: https://github.com/jackalope/jackalope-doctrine-dbal/blob/f7b286f388e0d3a42497c29e597756d6e346fea5/src/Jackalope/Transport/DoctrineDBAL/Client.php#L1804 single_dom_document: should represent the state of 2.0.0-beta2 version after: https://github.com/jackalope/jackalope-doctrine-dbal/pull/423/files

Blackfire

I did not use Blackfire for benchmarking as it did show in past benchmark where xml_parse can not be good profiled as having a lot of callback method being called via xml_set_element_handler and xml_set_character_data_handler. So profiling takes more time as processing things as Blackfire need to log every method call. Instead I depend on classic benchmarking via time() and memory_get_peak_usage(true) measures.

Required changes for improvements

A: Group Properties

The most important thing is that we remove all properties at once instead of calling saveXML after each property removal.

For this we mostly would require first group all deleteProperties by its node:

function groupByNode($deletePropertyPaths): array {
    $grouped = [];
    foreach ($deletePropertyPaths as $path) {
        $nodePath = PathHelper::getParentPath($path);
        $propertyName = PathHelper::getNodeName($path);

        $grouped[$nodePath][] = $propertyName;
    }

    return $grouped;
}

Then we load the single node remove all the properties and save the xml once via saveXML.

B: Grouped Reference delete queries

The queries to remove references should also be grouped and best a single query be send to delete the references instead of one query per reference.

The queries are currently ignored in the benchmark as it is focused on XML manipulation.

C: Replace DOMDocument with xml_parse

DOMDocument is bad for performance and should be avoided. The xml_parse as it allows us to streamed reading the xml and skip the properties which we want to remove. The XmlPropsRemover is an example how this could be done.

D: TODO

currently there is a little difference in the 2 printed xmls:

-rw-r--r--   1 staff  staff  11940901 22 Nov 23:38 removed_legacy.xml
-rw-r--r--   1 staff  staff  12002548 22 Nov 23:39 removed_xml_parse.xml

Update xml_parse variant now has the same output as the previous DOMDocument version:

-rw-r--r--   1 staff  staff  11940901 22 Nov 23:33 removed_legacy.xml
-rw-r--r--   1 staff  staff  11940901 23 Nov 00:18 removed_xml_parse.xml
-rw-r--r--   1 staff  staff  12009559 22 Nov 23:45 removed_legacy_pretty.xml
-rw-r--r--   1 staff  staff  12009559 23 Nov 00:18 removed_xml_parse_pretty.xml

TODO:

dbu commented 10 months ago

that is some really impressive improvement, thanks for the analysis and the implementation.

alexander-schranz commented 10 months ago

Thx for the reviews.

@dbu we will next week do some more profiling on some projects and then I will continue here with some fixes. Pushed this already so we can test it on various projects via:

    "repositories": [
        {
            "type": "vcs",
            "url": "git@github.com:alexander-schranz/jackalope-doctrine-dbal.git"
        }
    ],
    "require": {

        "jackalope/jackalope-doctrine-dbal": "dev-feature/improve-delete-properties-performance as 1.10.0",

    }
dbu commented 10 months ago

i am about to tag the next beta for jackalope-doctrine-dbal 2.0

i think there is nothing preventing us from doing 2.1 very quickly afterwards, once this is ready. but that way we at least should soon have a stable release that installs with symfony 7 - and this change has no BC breaks, it is an optimization that can come as minor version. (unless you see something that you would want to BC break to make things easier - if we need that tell me by monday so we can look into it before releasing 2.0.0 stable.)

alexander-schranz commented 10 months ago

@dbu I will do again a test run against sulu jackolope 2 tomorrow. To release this as new minor for 1 and/or 2 I also see no problem as it is just changing internal logic.

alexander-schranz commented 10 months ago

I did do some adjustements so escaping the property name and value behave the same as before and extend the test case so we are 100% sure the XML special chars are escaped the same way as before.

alexander-schranz commented 10 months ago

Applied the suggestions to the pull request, so it is ready now :) Also did again more testing with special chars and the result is 100% the same for old and new implementations:

Bildschirmfoto 2023-12-11 um 09 53 37
dbu commented 9 months ago

released in 1.11.0