yasserg / crawler4j

Open Source Web Crawler for Java
Apache License 2.0
4.54k stars 1.93k forks source link

Processing JS #197

Open mateusz-nalepa opened 7 years ago

mateusz-nalepa commented 7 years ago

Is there any way to process JavaScript with crawler4j? Or maybe send request to node.js server, process page and return processed html to Crawler4j?

s17t commented 7 years ago

Hi, not a the current state. It would be possible using a stack based on Selenium and/or CasperJS and/or Gebi. I would like to find time to integrate it in crawler4j since I did a custom JS-enabled crawler for my own.

ami2bal commented 7 years ago

Would it be possible to create a method "onBeforeParsing(Page page)" in the "WebCrawler", just before the "parser.parse(page, curURL.getURL());".

It would be possible then to override the html content.