Closed nengine closed 10 years ago
you can use focus_crawl
and use your logic to extract/patch all of the links that will be visited
crawler.focus_crawl do |page|
page.links.map{ |link| URI.encode(URI.decode(link.to_s.gsub("%E2%80%93","%96"))) }.uniq
end
Ok. Thank you very much.
Is there an option for crawl delay?
you can enable the sleeper plugin: https://github.com/taganaka/polipus/blob/master/lib/polipus/plugins/sleeper.rb
or
polipus.on_page_dowloaded {|page| sleep 1}
Hi thanks!
Hello, I have a pattern "%E2%80%93" in the URL strings and need to replace that with "%96" before a Page is saved. Some websites use strange characters in the URLs and I discovered that some of those strange characters must be replaced, otherwise URL cannot be visited. I believe these URLs are stored as links on a page.
Please let me know if there is a way to replace URL based on some regex pattern before a page stored?