I have subclassed WWW::Mechanize with a simple sub-class to work around a broken website that has placed newlines into various action="" and href="" attributes:
package MyMech;
use base 'WWW::Mechanize';
sub update_html {
my ($self, $html) = @_;
$html =~ s{(action|href)="([^"\r\n]+)(?:\r?\n|\%0A)([^"]+)"}{$1="$2$3"}sg;
$self->WWW::Mechanize::update_html($html);
}
1;
However, one of their broken hrefs is in the <base> tag, which is successfully modified in content by the above, but this does NOT change the actual base key within the WWW::Mechanize object or the result of $self->base(), and thus attempting to follow relative links still fails due to the invalid character.
I understand that this specific issue is in fact clearly the site in question's problem, but I can envision other situations where it would be desirable to modify the base href and have it reflected within the Mechanize object. At present the only way I can see to do this is by manually modifying $object->{'base'} myself, violating the isolation of the data structure.
This issue is present in the current WWW::Mechanise 1.66.
Please let me know if you have any questions!
Regards,
Tim Wilde
twi...@gmail.com reported on Feb 2, 2011
Details
Imported from Google Code issue 196 via archive
WM
Comments
petda...@gmail.com commented on Apr 24, 2011 :
WM