Closed GoogleCodeExporter closed 9 years ago
i already fix this problem. the origin code of PageFetcher fail to support
crawling https pages.
this is how i fix it,using another fetcher and register a https Scheme, which
is reference in the following url, but i do some modification from that.
http://stackoverflow.com/questions/2703161/how-to-ignore-ssl-certificate-errors-
in-apache-httpclient-4-0
import org.apache.http.conn.scheme.Scheme;
import edu.uci.ics.crawler4j.crawler.CrawlConfig;
import edu.uci.ics.crawler4j.fetcher.PageFetcher;
public class MyFetcher extends PageFetcher {
public MyFetcher(CrawlConfig config) {
super(config);
if (config.isIncludeHttpsPages()) {
try {
httpClient.getConnectionManager().getSchemeRegistry()
.unregister("https");
httpClient.getConnectionManager().getSchemeRegistry().register(
new Scheme("https", 443, new MockSSLSocketFactory()));
} catch (Exception e) {
e.printStackTrace();
}
}
}
}
Original comment by yaoancheng@gmail.com
on 20 Sep 2012 at 2:55
MockSSLSocketFactory.java
import java.io.IOException;
import java.security.KeyManagementException;
import java.security.KeyStoreException;
import java.security.NoSuchAlgorithmException;
import java.security.UnrecoverableKeyException;
import java.security.cert.CertificateException;
import javax.net.ssl.SSLException;
import javax.net.ssl.SSLSession;
import javax.net.ssl.SSLSocket;
import org.apache.http.conn.ssl.SSLSocketFactory;
import org.apache.http.conn.ssl.TrustStrategy;
import org.apache.http.conn.ssl.X509HostnameVerifier;
public class MockSSLSocketFactory extends SSLSocketFactory {
public MockSSLSocketFactory() throws NoSuchAlgorithmException,
KeyManagementException, KeyStoreException,
UnrecoverableKeyException {
super(trustStrategy, hostnameVerifier);
}
private static final X509HostnameVerifier hostnameVerifier = new X509HostnameVerifier() {
@Override
public void verify(String host, SSLSocket ssl) throws IOException {
// Do nothing
}
@Override
public void verify(String host, String[] cns, String[] subjectAlts)
throws SSLException {
// Do nothing
}
@Override
public boolean verify(String s, SSLSession sslSession) {
return true;
}
@Override
public void verify(String arg0, java.security.cert.X509Certificate arg1)
throws SSLException {
// TODO Auto-generated method stub
}
};
private static final TrustStrategy trustStrategy = new TrustStrategy() {
@Override
public boolean isTrusted(java.security.cert.X509Certificate[] arg0,
String arg1) throws CertificateException {
return true;
}
};
}
Original comment by yaoancheng@gmail.com
on 20 Sep 2012 at 2:56
Original comment by avrah...@gmail.com
on 18 Aug 2014 at 3:27
Looks like a good solution.
Still doesn't work for all cases as seen in issue: 286
Original comment by avrah...@gmail.com
on 2 Sep 2014 at 12:27
I would suggest having another look here for a better solution maybe:
http://stackoverflow.com/questions/2703161/how-to-ignore-ssl-certificate-errors-
in-apache-httpclient-4-0
Original comment by avrah...@gmail.com
on 15 Sep 2014 at 2:24
Fixed at rev: a96701fed185
I have chosen a different and shorter approach (clearer by my estimation)
Original comment by avrah...@gmail.com
on 15 Sep 2014 at 2:33
Original issue reported on code.google.com by
yaoancheng@gmail.com
on 20 Sep 2012 at 12:14