libwww-perl / WWW-Mechanize

Handy web browsing in a Perl object
https://metacpan.org/pod/WWW::Mechanize
Other
68 stars 53 forks source link

$mech->base() does not work with Content-Encoding: gzip #184

Open spazm opened 7 years ago

spazm commented 7 years ago

michael....@gmail.com reported on Dec 2, 2010

What steps will reproduce the problem?

http://www.mscha.org/tmp/basetest.html runs on a server with mod_deflate enabled, and has the following content:

    <html>
        <head>
            <base href="/whatever" />
        </head>
        <body>
            Clicked!
        </body>
    </html>

Now run:

    my $mech = WWW::Mechanize->new();
    $mech->get('http://www.mscha.org/tmp/basetest.html');
    $mech->follow_link(text=>'click me');

Expected: no output.
See instead: Error GETing http://www.mscha.org/tmp/tmp/basetest2.html: Not Found

WWW::Mechanize 1.66 and libwww-perl 5.837

The problem appears to be that $mech->base() blindly uses HTTP::Response's base(), which is not gzip-encoding-aware.

Details

Imported from Google Code issue 188 via archive

Comments

michael....@gmail.com commented on Dec 2, 2010 :

Sorry, pasted wrong example HTML.  Should be:

    
        
            
        
        
            click me
        
    

petda...@gmail.com commented on Apr 24, 2011 :

(No comment was entered for this change.)

cherac...@gmail.com commented on Jan 27, 2012 :

In case anyone is looking for a workaround, you can turn off compression on your request by using the 'Accept-Encoding' header and setting the value to 'identity' only.  This fixed the problem for me (though it is admittedly inconsiderate to force all servers to send you uncompressed data.)

$mech->get( $current_url, 'Accept-Encoding' => 'identity', );