Preventing performance decrease with increasing number of nodes

GoogleCodeExporter commented 9 years ago

We have noticed a rather larger performance impact when the number of nodes in 
XMl went over a certain number.
A performance hit of over 50% was noticed when adding a single node to he XML.

Are there any known issues that cause this. Maybe a workaround that would avoid 
these kind of issues.

Let me know if you need more details about the issue we're experiencing.

Original issue reported on code.google.com by boste.zv...@gmail.com on 16 May 2014 at 7:10

GoogleCodeExporter commented 9 years ago

There are no known issues that could lead to this.

Please specify the exact number of nodes and the measurement results before 
adding an extra node to the document and after. Also please clarify the 
operation that you're performing (I'm assuming it's parsing the text, but I 
want to make sure).

Original comment by arseny.k...@gmail.com on 16 May 2014 at 7:18

GoogleCodeExporter commented 9 years ago

What we have noticed is that by adding a few nodes to the XML the select_nodes 
function takes longer to complete.

This change was noticed in our daily performance tests for our application.
In our test case this function is called approximately 19000 times.

Debugging and profiling showed that CPU time jumps from just under 7 seconds to 
just under 12 seconds. That's a difference of 5 seconds.
We weren't expecting a hit like this when adding 10 more lines to the XML.

I attached the XML files in question. The short.xml is the one that remains 
fast while the long.xml is the one that takes significantly longer to process.

I hope this makes it a bit clearer as to what we're experiencing and how we're 
measuring the impact of the different XML files.

Original comment by boste.zv...@gmail.com on 16 May 2014 at 7:42

Attachments:

GoogleCodeExporter commented 9 years ago

One more question - is your application 32-bit or 64-bit?

Original comment by arseny.k...@gmail.com on 16 May 2014 at 3:50

GoogleCodeExporter commented 9 years ago

Actually never mind, I was thinking of parsing, but the problem is in XPath 
evaluation. Thank you for providing the detailed explanation and the test 
documents! I'll take a look.

Original comment by arseny.k...@gmail.com on 16 May 2014 at 3:54

GoogleCodeExporter commented 9 years ago

I looked at the documents and nothing particularly stands out. You have 10 more 
parameters (464 -> 474) at the top level I think, but it's hard to imagine a 
query that takes much longer because of that.

1. How do you know that the problem is in select_nodes?
2. Do you know which query is much slower than before? Can you show a 
representative XPath expression?

Original comment by arseny.k...@gmail.com on 16 May 2014 at 5:49

GoogleCodeExporter commented 9 years ago

As mentioned before we profiled the DLL using and noticed that the combined CPU 
time of the line of the code that calls this function changed from 7 to 12 
seconds.

Since (as you said) the only difference is those 10 extra parameters we didn't 
know what went wrong.

I removed parameter by parameter and basically found out that the delay occurs 
by addition of a single new parameter (critical parameter number is somewhere 
between the above mentioned 464 and 474)

You bring forward a good point. It's possible that only a single call (out of 
~19000) is performed slower.

I will try to test for that particular case next week. If I determine that just 
one (or a few) calls are slower I will provide you with more details about 
those.

Original comment by boste.zv...@gmail.com on 16 May 2014 at 11:46

GoogleCodeExporter commented 9 years ago

I worked with profiling some more and have interesting results which may or may 
not help you.

I timed the queries and found out that 99% of the queries that take more then 
2ms to execute all involve the same parameter.
The exact query is: /Root/Param[@ID='4198528']

What I find puzzling is that although the code is the same this query seems to 
take longer more times once the certain size of the XML.

Is there something specific I should test for. Would making an application 
trying to replicate this help you in any way (I can't really share our software 
code as is)

Original comment by boste.zv...@gmail.com on 27 May 2014 at 12:16

GoogleCodeExporter commented 9 years ago

It makes sense that the queries with longer times would be the ones that find 
the param by ID; it still does not make sense that the combined query time 
almost doubles.

Is the query that you posted here the full one, or just a fragment of a query? 
Because the 2ms number sounds very suspicious - this code: 
https://gist.github.com/zeux/b9e0234486f6f46f6af3

reports about 0.07 ms/query for both XMLs and 0.09 ms/query if I move 
xpath_query construction inside the loop. (note: I'm testing on Mac where 
clock() is high-resolution; the same code will misreport timings on Windows due 
to lack of clock() precision)

Making an application that contains just XPath query evaluations and still 
reproduces the problem would help a lot. This process can be simplified by 
making a trace of all XPath queries -- i.e. for each query evaluation log the 
query text and a node identifier (built with something like this, for example:

std::map<pugi::xml_node, int> ids;
doc.find_node([&](pugi::xml_node n) { ids[n] = ids.size(); return false; });

find_node here performs a deep traversal of the document, capturing all nodes 
in a map; you can use any alternative mechanisms of recursive traversal.

Logging a node id and query text for each query should allow me to reproduce 
the XPath queries you're making, as long as you also dump the raw document with:

doc.save_file("dump.xml", pugi::format_raw | pugi::format_no_declaration);

Alternatively, depending on your platform, you may be able to build against 
dynamically linked pugixml; if this is feasible, then you can give me an 
executable (without symbols) that reproduces the observed behavior. This is 
more complicated because it would require matching C++ runtime versions if you 
use pugixml functions that depend on STL.

Original comment by arseny.k...@gmail.com on 28 May 2014 at 4:31

GoogleCodeExporter commented 9 years ago

The 2ms is not a number that my timing reported.
As far as I could see most queries were reported back as 0ms.

So I went and filtered out those that last over 2ms.
All those that actually appeared after that took 15 or 16ms to execute.

Having said that the number of those queries that were reported as having taken 
longer (15-16ms reported) grew a lot.

Now as you said the actual time they took to execute is probably not 0 or even 
15-16ms. That's just what the "GetTickCount" difference was.

I stuck with putting timers into our code for now. I'll go and make an 
application and test with that to se what kind of difference between XML 
parsing is evident outside of our software.

Original comment by boste.zv...@gmail.com on 28 May 2014 at 5:23

GoogleCodeExporter commented 9 years ago

OK I basically ripped that code of yours and used it for my own testing so now 
we have a comparable testing environment. I am using Windows 8.1 as the testing 
OS.

Running the test yields this result:
short.xml: 1.244 ms/q
long.xml:  2.289 ms/q

As you see there's a big jump which is exactly what wee're experiencing.

Original comment by boste.zv...@gmail.com on 28 May 2014 at 6:32

GoogleCodeExporter commented 9 years ago

We're on to something. Upon closer inspection the long xml exhibits an 
edge-case wrt allocation behavior, causing it to allocate many more times 
during query evaluation compared to the short xml.

Your absolute timings still don't make sense to me :) I get 0.07ms/q for short 
and 0.09ms/q for long in Release build; a Debug build does get me in a ~1.5ms/q 
range (which is mostly the fault of "Basic Runtime Checks"). So it feels like 
you're either using an unoptimized build or have a heavy allocator (i.e. 
Windows Debug Heap) enabled.

Anyway, part of this is definitely a problem in pugixml, so I'll try to fix 
this.

Original comment by arseny.k...@gmail.com on 28 May 2014 at 10:42

GoogleCodeExporter commented 9 years ago

Can you check if the attached change fixes your problem?

This is a diff from current trunk, as well as the modified .cpp file. Current 
trunk is almost the same as 1.4 so you can just overwrite your version if 
you're using 1.4 (if you're using a previous option you'll have to update it).

The change saves 1000 allocations (out of 9814) in the test suite and 
eliminates the problematic behavior in the test gist posted above.

Original comment by arseny.k...@gmail.com on 28 May 2014 at 11:34

Attachments:

GoogleCodeExporter commented 9 years ago

As you guessed I was using a debug build so times may be different then with a 
release build. I wasn't concerned with the actual time it took as much as the 
increase in said time in relation to the "short" xml.

Now on to the code update. Ran a quick test with the abovementioned application 
and times are now:
short: 1.240
long : 1.251

it shows a huge improvement so I will be including this change to our software 
and run full scale tests

Original comment by boste.zv...@gmail.com on 29 May 2014 at 5:16

GoogleCodeExporter commented 9 years ago

Full tests have now been run and they show improvement. CPU times between 
different XMLs are now comparable.

We did notice that memory consumption went up considerably in our tests.
Is this something you would expect considering the changes you made or not? 

It is possible that our results are a fluke (we are making more tests right 
now).

We don't really have any problem with a bit higher memory consumption but I 
would like to know this is normal considering we went from pugi 1.0 to 1.4 
(with code fix).

In any case I thank you for your prompt responses and quick work on this issue.

Original comment by boste.zv...@gmail.com on 29 May 2014 at 8:26

GoogleCodeExporter commented 9 years ago

Nothing should have changed memory-consumption wise from 1.0 to 1.4...

The change required for this issue slightly increases the peak memory 
consumption during evaluation of complex XPath queries, but the increase should 
be small (i.e. for your query the increase is ~1-2Kb).
I guess you can try 1.4 separately from the patched version to isolate the 
issue.

Original comment by arseny.k...@gmail.com on 29 May 2014 at 3:59

GoogleCodeExporter commented 9 years ago

Ignore the memory comment. As I suspected further tests showed it was a fluke 
result.

I have committed pugi 1.4 code with the fix you supplied to our software.
Do you plan to commit this fix to pugi code for next release?

Original comment by boste.zv...@gmail.com on 29 May 2014 at 9:03

GoogleCodeExporter commented 9 years ago

I submitted a slightly different version of the code as r999.
It results in slightly less memory overhead but still solves the observed 
problem in my tests.

Thank you for reporting this and helping to diagnose the issue!

Original comment by arseny.k...@gmail.com on 1 Jun 2014 at 7:19

Changed state: Fixed

letanphuc / pugixml

Preventing performance decrease with increasing number of nodes #229