libwww-perl / WWW-Mechanize

Handy web browsing in a Perl object
https://metacpan.org/pod/WWW::Mechanize
Other
68 stars 53 forks source link

update_html unable to affect base href #191

Open spazm opened 7 years ago

spazm commented 7 years ago

twi...@gmail.com reported on Feb 2, 2011

I have subclassed WWW::Mechanize with a simple sub-class to work around a broken website that has placed newlines into various action="" and href="" attributes:

package MyMech;
use base 'WWW::Mechanize';

sub update_html {
    my ($self, $html) = @_;
    $html =~ s{(action|href)="([^"\r\n]+)(?:\r?\n|\%0A)([^"]+)"}{$1="$2$3"}sg;
    $self->WWW::Mechanize::update_html($html);
}

1;

However, one of their broken hrefs is in the <base> tag, which is successfully modified in content by the above, but this does NOT change the actual base key within the WWW::Mechanize object or the result of $self->base(), and thus attempting to follow relative links still fails due to the invalid character.

I understand that this specific issue is in fact clearly the site in question's problem, but I can envision other situations where it would be desirable to modify the base href and have it reflected within the Mechanize object.  At present the only way I can see to do this is by manually modifying $object->{'base'} myself, violating the isolation of the data structure.

This issue is present in the current WWW::Mechanise 1.66.

Please let me know if you have any questions!

Regards,
Tim Wilde

Details

Imported from Google Code issue 196 via archive

Comments

petda...@gmail.com commented on Apr 24, 2011 :

(No comment was entered for this change.)