momer / nutch-selenium

Apache License 2.0
28 stars 20 forks source link

Nutch crawl using selenium plugin doesn't crawling data #3

Open Yoganandh opened 8 years ago

Yoganandh commented 8 years ago

Hi, I am using Nutch 1.10 version, Selenium 2.44.0 and Firefox 40.0.3 . I wanted to crawl dynamic contents of web pages. I have followed the instructions given in this link https://github.com/momer/nutch-selenium . but when I execute the Nutchcrawl the process is executing. But when I try to take a dump from the segments it doesn't contain any data content. I am facing this issue only when I include the "protocol-selenium" plugin. Without this plugin I can able to crawl and I am getting the data content while dumping it. I don't know where am I going wrong please correct me and help me in this regard. I am using the below command to start nutch to crawl: $ bin/crawl /home/yoganandh/yoga/testnutch/apache-nutch-1.10/runtime/local/urls/seed.txt /home/yoganandh/yoga/testnutch/apache-nutch-1.10/runtime/local/crawl 2

nutch-selenium

dump command: $ bin/nutch readseg -dump crawl/segments/20151006174816 dumpData1 -nocontent -nofetch -nogenerate -noparse -noparsedata

Thanks in advance.