Open GoogleCodeExporter opened 9 years ago
[deleted comment]
Hi Vincent,
I've tested some code lines and it works for me. Please follow these steps:
Step one:
In PageFetcher juste after creating httpClient Object add these lines:
httpClient.getAuthSchemes().register(AuthPolicy.NTLM, new AuthSchemeFactory () {
public AuthScheme newInstance(HttpParams params) {
return new NTLMScheme(new JCIFSEngine());
}
});
httpClient.getCredentialsProvider().setCredentials(AuthScope.ANY, new
NTCredentials(USER, PASS, HOST, DOMAIN));
Step two (you can find the new classe as attachment):
- Create a new Class called JCIFSEngine in edu.uci.ics.crawler4j.auth package
for example.
- Copy the code from this link
https://hc.apache.org/httpcomponents-client-4.3.x/ntlm.html into JCIFSEngine
Classe.
Step tree:
Enjoy.
Maybe crawler4j Team can integrate properly these lines of code. And it will be
good if we can choose between kinds of authentication.
We can image somthing like that in PageFetcher:
protected void configureHttpClientAuth() {
if (isNTLMAuth()) {
configureNTLMHttpClient();
} else if (isBASICAuth()) {
configureBasicHttpClient();
} else if (isNegotiateAuth()) {
configureNegotiateHttpClient();
} else {
logger.info("No authentication to configure.");
}
}
Thanks.
Nizar.
Original comment by nizar.salhaji@gmail.com
on 16 Jul 2014 at 1:54
Attachments:
Original comment by avrah...@gmail.com
on 18 Aug 2014 at 3:48
I have created a way to login in the latest Rev: 4388892aeb78
Grab latest from trunk and see if it works for you.
If it doesn't then just integrate the NTLM authentication into it and send me a
patch which I can integrate into the core.
Original comment by avrah...@gmail.com
on 26 Nov 2014 at 5:38
Look at Mario's Code here:
https://code.google.com/p/crawler4j/wiki/Crawling_Password_Protected_Sites
Original comment by avrah...@gmail.com
on 2 Dec 2014 at 10:12
Original issue reported on code.google.com by
vincent....@gmail.com
on 17 Jan 2014 at 6:45