libwww-perl / HTML-Parser

The HTML-Parser distribution is is a collection of modules that parse and extract information from HTML documents.
Other
6 stars 13 forks source link

inject method - API extension request [rt.cpan.org #5941] #9

Open oalders opened 4 years ago

oalders commented 4 years ago

Migrated from rt.cpan.org#5941 (status was 'open')

Requestors:

Attachments:

From on 2004-04-05 23:25:04 :

Perl version: v5.8.3 built for i386-linux-thread-multi
HTML::Parser version: 3.36
on Linux 2.4.25, Debian testing dist

I am working with emulation of web browsers and found I need to have some level of preprocessing in the HTML parser.  A primitive I could use for this is the ability to inject input immediately after the current parse token.

As best I can tell, when a browser hits a chunk of content such as:
<script>
document.write('<a href="http://www.perl.org/">the stuff</a>');
</script>
it essentially injects that text immediately after the </script> element in the input parse buffer.

The attached patch adds an ->inject(chunk) method to an HTML::Parser object, and is far from a clean patch, but shows my intent.

Here is a sample use of the inject method to do simple preprocessing:

#!/usr/bin/perl
use strict;
use warnings;
use lib 'blib/lib';
use lib 'blib/arch';
use HTML::Parser qw();
use URI::Escape qw();
use IO::String qw();
use IO::Handle qw();

my $h = <<EOF;
<deftag name="foo">bar</deftag>
<deftag name="navbar">
  <foo>
  <table>
  <tr><td><a href="http://www.perl.org/">perl</a>
  <tr><td><a href="http://www.apache.org/">apache</a>
  <tr><td><a href="http://www.mozilla.org/">mozilla</a>
  </table>
</deftag>
<html><head><title>foo</title></head><body>
<navbar>
Testing 1... 2... 3...
</body></html>
EOF

my %special = ();
my $cdt = undef;
my $p;
my @out = (\*STDOUT);
$p = new HTML::Parser(
    'start_h' => [ sub { my($tag, $attr, $txt) = @_;
        if(exists $special{$tag}) {
            $p->inject($special{$tag});
        } elsif($tag eq 'deftag') {
            $cdt = $attr->{'name'};
            unshift @out, IO::String->new();
        } else {
            $out[0]->print($txt);
        }
    }, 'tag,attr,text' ],
    'text_h' => [ sub { $out[0]->print(shift) }, 'text' ],
    'end_h'  => [ sub { my($tag, $txt) = @_;
        if($tag eq '/deftag') {
            $special{$cdt} = ${$out[0]->string_ref()};
            shift @out;
        } else {
            $out[0]->print($txt);
        }
    }, 'tag,text' ],
) or die "No parser: $!";
$p->parse($h);

From on 2006-06-18 15:01:07 :

<a href='http://www.yahoo.com'></a>Thanks! http://www.insurance-top.com/auto/ <a href='http://www.insurance-top.com'>auto insurance</a>. <a href="http://www.insurance-top.com ">Insurance car</a>: auto insurance, insurance car, Best Insurance Web site
. Also [url]http://www.insurance-top.com/car/[/url] and [link=http://www.insurance-top.com]insurance quote[/link] from site .

From on 2006-06-18 15:01:13 :

Thanks!!! http://www.insurance-top.com/company/ auto site insurance. [URL=http://www.insurance-top.com]home insurance[/URL]: auto insurance, insurance car, Best Insurance Web site
. Also [url=http://www.insurance-top.com]cars insurance[/url] from website .

From on 2006-06-18 15:01:17 :

Hi! http://www.insurance-top.com/company/ auto site insurance. auto insurance, insurance car, Best Insurance Web site
. from website .

From on 2006-06-18 15:01:21 :

From on 2006-06-18 15:01:25 :