Perl version: v5.8.3 built for i386-linux-thread-multi
HTML::Parser version: 3.36
on Linux 2.4.25, Debian testing dist
I am working with emulation of web browsers and found I need to have some level of preprocessing in the HTML parser. A primitive I could use for this is the ability to inject input immediately after the current parse token.
As best I can tell, when a browser hits a chunk of content such as:
<script>
document.write('<a href="http://www.perl.org/">the stuff</a>');
</script>
it essentially injects that text immediately after the </script> element in the input parse buffer.
The attached patch adds an ->inject(chunk) method to an HTML::Parser object, and is far from a clean patch, but shows my intent.
Here is a sample use of the inject method to do simple preprocessing:
#!/usr/bin/perl
use strict;
use warnings;
use lib 'blib/lib';
use lib 'blib/arch';
use HTML::Parser qw();
use URI::Escape qw();
use IO::String qw();
use IO::Handle qw();
my $h = <<EOF;
<deftag name="foo">bar</deftag>
<deftag name="navbar">
<foo>
<table>
<tr><td><a href="http://www.perl.org/">perl</a>
<tr><td><a href="http://www.apache.org/">apache</a>
<tr><td><a href="http://www.mozilla.org/">mozilla</a>
</table>
</deftag>
<html><head><title>foo</title></head><body>
<navbar>
Testing 1... 2... 3...
</body></html>
EOF
my %special = ();
my $cdt = undef;
my $p;
my @out = (\*STDOUT);
$p = new HTML::Parser(
'start_h' => [ sub { my($tag, $attr, $txt) = @_;
if(exists $special{$tag}) {
$p->inject($special{$tag});
} elsif($tag eq 'deftag') {
$cdt = $attr->{'name'};
unshift @out, IO::String->new();
} else {
$out[0]->print($txt);
}
}, 'tag,attr,text' ],
'text_h' => [ sub { $out[0]->print(shift) }, 'text' ],
'end_h' => [ sub { my($tag, $txt) = @_;
if($tag eq '/deftag') {
$special{$cdt} = ${$out[0]->string_ref()};
shift @out;
} else {
$out[0]->print($txt);
}
}, 'tag,text' ],
) or die "No parser: $!";
$p->parse($h);
From on 2006-06-18 15:01:07
:
<a href='http://www.yahoo.com'></a>Thanks! http://www.insurance-top.com/auto/ <a href='http://www.insurance-top.com'>auto insurance</a>. <a href="http://www.insurance-top.com ">Insurance car</a>: auto insurance, insurance car, Best Insurance Web site
. Also [url]http://www.insurance-top.com/car/[/url] and [link=http://www.insurance-top.com]insurance quote[/link] from site .
From on 2006-06-18 15:01:13
:
Thanks!!! http://www.insurance-top.com/company/ auto site insurance. [URL=http://www.insurance-top.com]home insurance[/URL]: auto insurance, insurance car, Best Insurance Web site
. Also [url=http://www.insurance-top.com]cars insurance[/url] from website .
From on 2006-06-18 15:01:17
:
Hi! http://www.insurance-top.com/company/ auto site insurance. auto insurance, insurance car, Best Insurance Web site
. from website .
Migrated from rt.cpan.org#5941 (status was 'open')
Requestors:
Attachments:
From on 2004-04-05 23:25:04 :
From on 2006-06-18 15:01:07 :
From on 2006-06-18 15:01:13 :
From on 2006-06-18 15:01:17 :
From on 2006-06-18 15:01:21 :
From on 2006-06-18 15:01:25 :