momer / nutch-selenium

Apache License 2.0
28 stars 20 forks source link

HTTPS protocol support, Regex URL Filter, Fix Readme #5

Closed feelinc closed 8 years ago

feelinc commented 8 years ago

Introduce regex URL filter for deciding which URLs should using selenium protocol, enable https protocol.

Create regex-urlselenium.txt with rules similar to regex-urlfilter.txt

Fix Readme with the exact Firefox version support for selenium version.

Apache Nutch 2.3.0 tested.

feelinc commented 8 years ago

fyi, I'm not used to code in Java, so I hope not mess around.

momer commented 8 years ago

@feelinc the current recommendation on adding HTTPS support is to enable protocol-httpclient. Please see the README for protocol-selenium for more info.

Alternatively, it looks like you've done quite a bit here, and I'd recommend discussing with the mailing list about your additions! Certainly the Nutch team would appreciate your help on other issues and feature additions beyond the scope of the selenium plugin!

Check out http://nutch.apache.org/mailing_lists.html and be sure to include a link to your work here!