joopies / crawler4j

Automatically exported from code.google.com/p/crawler4j
0 stars 0 forks source link

Make cookie policy configurable #48

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
Some servers screw up cookies.  It would be great to be able to set cookie 
policy, for example to IGNORE_COOKIES: 

HttpGet httpget = new HttpGet("http://www.broken-server.com/");
// Override the default policy for this request
httpget.getParams().setParameter(
        ClientPNames.COOKIE_POLICY, CookiePolicy.BROWSER_COMPATIBILITY);

See: 
http://hc.apache.org/httpcomponents-client-ga/tutorial/html/statemgmt.html#d4e80
8
and 
http://hc.apache.org/httpcomponents-client-ga/httpclient/apidocs/constant-values
.html#org.apache.http.client.params.ClientPNames.COOKIE_POLICY

Original issue reported on code.google.com by vonhessl...@gmail.com on 13 May 2011 at 6:26

GoogleCodeExporter commented 9 years ago
The temporary fix for me is to modify the end of the static block of 
PageFetcher like this: 

        connectionManager = new ThreadSafeClientConnManager(params, schemeRegistry);
        logger.setLevel(Level.INFO);
        httpclient = new DefaultHttpClient(connectionManager, params);
httpclient.getParams().setParameter(ClientPNames.COOKIE_POLICY, 
CookiePolicy.IGNORE_COOKIES);

    }

For this you need to upgrade the httpcore and httpclient libraries to version 
4.1 each. 
This solves my needs to deal with problematic servers. 

Perhaps someone could make this nicely configurable?

Original comment by vonhessl...@gmail.com on 16 May 2011 at 11:58

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
I am working on the version 3.0 which is using httpclient 4.1 libraries. It 
will be available soon.

-Yasser

Original comment by ganjisaffar@gmail.com on 17 May 2011 at 12:03

GoogleCodeExporter commented 9 years ago

Original comment by avrah...@gmail.com on 18 Aug 2014 at 3:06

GoogleCodeExporter commented 9 years ago

Original comment by avrah...@gmail.com on 18 Aug 2014 at 3:10