momer / nutch-selenium

Apache License 2.0
28 stars 20 forks source link

nutch - protocol not found for url=https #4

Closed moees closed 8 years ago

moees commented 8 years ago

Hi.

I've just tested the plugin with nutch 1.9, i used the patch in NUTCH-1933, it work well when i test with http urls, but i get

fetch of https://wiki.apache.org/nutch/HttpAuthenticationSchemes failed with: org.apache.nutch.protocol.ProtocolNotFound: protocol not found for url=https at org.apache.nutch.protocol.ProtocolFactory.getProtocol(ProtocolFactory.java:83) at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:687)

I tried to enable protocol-httpclient in nutch, it never launches firefox, and seems to ignore the protocol-selenium.

Is the plugin supposed to work with httpclient? have you tried to implement it? what challenges did you face?

Thanks

momer commented 8 years ago

Please see the README for protocol-selenium for info on working with HTTPS.

slylockfox commented 7 years ago

The README says to enable protocol-httpclient, which, as moees wrote, appears to cause the selenium plugin to be completely bypassed. How exactly do we enable protocol-httpclient so that the selenium plugin is not bypassed?

hussein-alahmad commented 6 years ago

I created a pull-request to fix this issue you can see it here