Closed GoogleCodeExporter closed 9 years ago
There are no known issues that could lead to this.
Please specify the exact number of nodes and the measurement results before
adding an extra node to the document and after. Also please clarify the
operation that you're performing (I'm assuming it's parsing the text, but I
want to make sure).
Original comment by arseny.k...@gmail.com
on 16 May 2014 at 7:18
What we have noticed is that by adding a few nodes to the XML the select_nodes
function takes longer to complete.
This change was noticed in our daily performance tests for our application.
In our test case this function is called approximately 19000 times.
Debugging and profiling showed that CPU time jumps from just under 7 seconds to
just under 12 seconds. That's a difference of 5 seconds.
We weren't expecting a hit like this when adding 10 more lines to the XML.
I attached the XML files in question. The short.xml is the one that remains
fast while the long.xml is the one that takes significantly longer to process.
I hope this makes it a bit clearer as to what we're experiencing and how we're
measuring the impact of the different XML files.
Original comment by boste.zv...@gmail.com
on 16 May 2014 at 7:42
Attachments:
One more question - is your application 32-bit or 64-bit?
Original comment by arseny.k...@gmail.com
on 16 May 2014 at 3:50
Actually never mind, I was thinking of parsing, but the problem is in XPath
evaluation. Thank you for providing the detailed explanation and the test
documents! I'll take a look.
Original comment by arseny.k...@gmail.com
on 16 May 2014 at 3:54
I looked at the documents and nothing particularly stands out. You have 10 more
parameters (464 -> 474) at the top level I think, but it's hard to imagine a
query that takes much longer because of that.
1. How do you know that the problem is in select_nodes?
2. Do you know which query is much slower than before? Can you show a
representative XPath expression?
Original comment by arseny.k...@gmail.com
on 16 May 2014 at 5:49
As mentioned before we profiled the DLL using and noticed that the combined CPU
time of the line of the code that calls this function changed from 7 to 12
seconds.
Since (as you said) the only difference is those 10 extra parameters we didn't
know what went wrong.
I removed parameter by parameter and basically found out that the delay occurs
by addition of a single new parameter (critical parameter number is somewhere
between the above mentioned 464 and 474)
You bring forward a good point. It's possible that only a single call (out of
~19000) is performed slower.
I will try to test for that particular case next week. If I determine that just
one (or a few) calls are slower I will provide you with more details about
those.
Original comment by boste.zv...@gmail.com
on 16 May 2014 at 11:46
I worked with profiling some more and have interesting results which may or may
not help you.
I timed the queries and found out that 99% of the queries that take more then
2ms to execute all involve the same parameter.
The exact query is: /Root/Param[@ID='4198528']
What I find puzzling is that although the code is the same this query seems to
take longer more times once the certain size of the XML.
Is there something specific I should test for. Would making an application
trying to replicate this help you in any way (I can't really share our software
code as is)
Original comment by boste.zv...@gmail.com
on 27 May 2014 at 12:16
It makes sense that the queries with longer times would be the ones that find
the param by ID; it still does not make sense that the combined query time
almost doubles.
Is the query that you posted here the full one, or just a fragment of a query?
Because the 2ms number sounds very suspicious - this code:
https://gist.github.com/zeux/b9e0234486f6f46f6af3
reports about 0.07 ms/query for both XMLs and 0.09 ms/query if I move
xpath_query construction inside the loop. (note: I'm testing on Mac where
clock() is high-resolution; the same code will misreport timings on Windows due
to lack of clock() precision)
Making an application that contains just XPath query evaluations and still
reproduces the problem would help a lot. This process can be simplified by
making a trace of all XPath queries -- i.e. for each query evaluation log the
query text and a node identifier (built with something like this, for example:
std::map<pugi::xml_node, int> ids;
doc.find_node([&](pugi::xml_node n) { ids[n] = ids.size(); return false; });
find_node here performs a deep traversal of the document, capturing all nodes
in a map; you can use any alternative mechanisms of recursive traversal.
Logging a node id and query text for each query should allow me to reproduce
the XPath queries you're making, as long as you also dump the raw document with:
doc.save_file("dump.xml", pugi::format_raw | pugi::format_no_declaration);
Alternatively, depending on your platform, you may be able to build against
dynamically linked pugixml; if this is feasible, then you can give me an
executable (without symbols) that reproduces the observed behavior. This is
more complicated because it would require matching C++ runtime versions if you
use pugixml functions that depend on STL.
Original comment by arseny.k...@gmail.com
on 28 May 2014 at 4:31
The 2ms is not a number that my timing reported.
As far as I could see most queries were reported back as 0ms.
So I went and filtered out those that last over 2ms.
All those that actually appeared after that took 15 or 16ms to execute.
Having said that the number of those queries that were reported as having taken
longer (15-16ms reported) grew a lot.
Now as you said the actual time they took to execute is probably not 0 or even
15-16ms. That's just what the "GetTickCount" difference was.
I stuck with putting timers into our code for now. I'll go and make an
application and test with that to se what kind of difference between XML
parsing is evident outside of our software.
Original comment by boste.zv...@gmail.com
on 28 May 2014 at 5:23
OK I basically ripped that code of yours and used it for my own testing so now
we have a comparable testing environment. I am using Windows 8.1 as the testing
OS.
Running the test yields this result:
short.xml: 1.244 ms/q
long.xml: 2.289 ms/q
As you see there's a big jump which is exactly what wee're experiencing.
Original comment by boste.zv...@gmail.com
on 28 May 2014 at 6:32
We're on to something. Upon closer inspection the long xml exhibits an
edge-case wrt allocation behavior, causing it to allocate many more times
during query evaluation compared to the short xml.
Your absolute timings still don't make sense to me :) I get 0.07ms/q for short
and 0.09ms/q for long in Release build; a Debug build does get me in a ~1.5ms/q
range (which is mostly the fault of "Basic Runtime Checks"). So it feels like
you're either using an unoptimized build or have a heavy allocator (i.e.
Windows Debug Heap) enabled.
Anyway, part of this is definitely a problem in pugixml, so I'll try to fix
this.
Original comment by arseny.k...@gmail.com
on 28 May 2014 at 10:42
Can you check if the attached change fixes your problem?
This is a diff from current trunk, as well as the modified .cpp file. Current
trunk is almost the same as 1.4 so you can just overwrite your version if
you're using 1.4 (if you're using a previous option you'll have to update it).
The change saves 1000 allocations (out of 9814) in the test suite and
eliminates the problematic behavior in the test gist posted above.
Original comment by arseny.k...@gmail.com
on 28 May 2014 at 11:34
Attachments:
As you guessed I was using a debug build so times may be different then with a
release build. I wasn't concerned with the actual time it took as much as the
increase in said time in relation to the "short" xml.
Now on to the code update. Ran a quick test with the abovementioned application
and times are now:
short: 1.240
long : 1.251
it shows a huge improvement so I will be including this change to our software
and run full scale tests
Original comment by boste.zv...@gmail.com
on 29 May 2014 at 5:16
Full tests have now been run and they show improvement. CPU times between
different XMLs are now comparable.
We did notice that memory consumption went up considerably in our tests.
Is this something you would expect considering the changes you made or not?
It is possible that our results are a fluke (we are making more tests right
now).
We don't really have any problem with a bit higher memory consumption but I
would like to know this is normal considering we went from pugi 1.0 to 1.4
(with code fix).
In any case I thank you for your prompt responses and quick work on this issue.
Original comment by boste.zv...@gmail.com
on 29 May 2014 at 8:26
Nothing should have changed memory-consumption wise from 1.0 to 1.4...
The change required for this issue slightly increases the peak memory
consumption during evaluation of complex XPath queries, but the increase should
be small (i.e. for your query the increase is ~1-2Kb).
I guess you can try 1.4 separately from the patched version to isolate the
issue.
Original comment by arseny.k...@gmail.com
on 29 May 2014 at 3:59
Ignore the memory comment. As I suspected further tests showed it was a fluke
result.
I have committed pugi 1.4 code with the fix you supplied to our software.
Do you plan to commit this fix to pugi code for next release?
Original comment by boste.zv...@gmail.com
on 29 May 2014 at 9:03
I submitted a slightly different version of the code as r999.
It results in slightly less memory overhead but still solves the observed
problem in my tests.
Thank you for reporting this and helping to diagnose the issue!
Original comment by arseny.k...@gmail.com
on 1 Jun 2014 at 7:19
Original issue reported on code.google.com by
boste.zv...@gmail.com
on 16 May 2014 at 7:10