libxml-raku / LibXML-raku

Raku bindings to the libxml2 native library
Artistic License 2.0
11 stars 5 forks source link

t/40reader_mem_error.t sometimes slow #14

Closed dwarring closed 5 years ago

dwarring commented 5 years ago

Completing and passing, but can take an unusually long time. CPU mostly idle.

dwarring commented 5 years ago

profiling doesn't shed much light:

<anon>      gen/moar/stage2/NQPHLL.nqp  1904        1           241090363       27            
<unit-oute  t/40reader_mem_error.t      1           1           241090335       874           
ACCEPTS     SETTING::src/core.c/Bool.p  27          1           30362           437           
postcircum  SETTING::src/core.c/array_  102         1           29867           783           
name        SETTING::src/core.c/Parame  239         1           21720           101           
sub_signat  SETTING::src/core.c/Parame  616         1           21575           49            
ACCEPTS     SETTING::src/core.c/Mu.pm6  19          1           21526           102  

Very large gap between '' and next the entry, although it doesn't claim to be using much inclusive time.

dwarring commented 5 years ago

Added some trace statements. Seems to be the first call of xmlTextReaderRead() that takes all the time.

Most likely libxml2 related. Update: further tracked down to xmlParseTryOrFinish()

markldevine commented 5 years ago

David,

After zef install . a couple of your iterations, I observed the parse performing like a lightning bolt, but every text node request seemed to sleep for .1 or .2 seconds. I didn’t mention it then because you’re covering so much ground at the time and I knew that it could wait.

Yes, I confirm that .getChildrenByTagName($TAG) was dragging a bit. My app reads 97,000+ of them in one go, so it was noticable. Very glad that it is in your sights now.

Thanks,

Mark

From: David Warring notifications@github.com Sent: Monday, September 2, 2019 15:58 To: p6-xml/LibXML-p6 LibXML-p6@noreply.github.com Cc: Subscribed subscribed@noreply.github.com Subject: Re: [p6-xml/LibXML-p6] t/40reader_mem_error.t sometimes slow (#14)

Added some trace statements. Seems to be the first call of xmlTextReaderRead() that takes all the time.

Most likely libxml2 related.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/p6-xml/LibXML-p6/issues/14?email_source=notifications&email_token=AHRJF2LQY62P526ATYSRACTQHVV5HA5CNFSM4ISZEOAKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5WPIIA#issuecomment-527234080, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AHRJF2LT6VHSXW6V3A7GAYLQHVV5HANCNFSM4ISZEOAA.

markldevine commented 5 years ago

Correction:

@xml-nodes[0].textContent was where I was seeing the delay

Mark

From: Mark Devine Sent: Monday, September 2, 2019 17:28 To: p6-xml/LibXML-p6 reply@reply.github.com Subject: RE: [p6-xml/LibXML-p6] t/40reader_mem_error.t sometimes slow (#14)

David,

After zef install . a couple of your iterations, I observed the parse performing like a lightning bolt, but every text node request seemed to sleep for .1 or .2 seconds. I didn’t mention it then because you’re covering so much ground at the time and I knew that it could wait.

Yes, I confirm that .getChildrenByTagName($TAG) was dragging a bit. My app reads 97,000+ of them in one go, so it was noticable. Very glad that it is in your sights now.

Thanks,

Mark

From: David Warring notifications@github.com<mailto:notifications@github.com> Sent: Monday, September 2, 2019 15:58 To: p6-xml/LibXML-p6 LibXML-p6@noreply.github.com<mailto:LibXML-p6@noreply.github.com> Cc: Subscribed subscribed@noreply.github.com<mailto:subscribed@noreply.github.com> Subject: Re: [p6-xml/LibXML-p6] t/40reader_mem_error.t sometimes slow (#14)

Added some trace statements. Seems to be the first call of xmlTextReaderRead() that takes all the time.

Most likely libxml2 related.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/p6-xml/LibXML-p6/issues/14?email_source=notifications&email_token=AHRJF2LQY62P526ATYSRACTQHVV5HA5CNFSM4ISZEOAKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5WPIIA#issuecomment-527234080, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AHRJF2LT6VHSXW6V3A7GAYLQHVV5HANCNFSM4ISZEOAA.

dwarring commented 5 years ago

@markldevine will have a look at .getChildrenByTagName / .textContent

dwarring commented 5 years ago

^^ @markldevine set if that optimisation helps. I'm getting some benefit on a smaller doc.

$ perl6 -I . -M LibXML -e 'my $xml-manager = LibXML.new; for (1..3) {my $start = now; my $doc = $xml-manager.parse(:io("etc/libxml2-api.xml")); $doc[0].textContent; note "fetch: " ~ now - $start;}'
fetch: 0.12039373
fetch: 0.0784591
fetch: 0.0790274

First fetch is down from about 0.18

dwarring commented 5 years ago

Problem was just slightly deeper: The 'I/O errors' were the symptom of a real problem. :!validation, :!load-ext-dtd reader flags weren't being correctly handled. So the test was OK as is. Fix was needed in LibXML::Reader options processing.