jhs2jhs / WebDownloadJobsManage

Download Web for purpose
Apache License 2.0
0 stars 0 forks source link

IP blocking restration #6

Open jhs2jhs opened 10 years ago

jhs2jhs commented 10 years ago

proxy rotation service: Such as service will have a large pool of IP addresses and rotate the IP addresses every time you make a request for a webpage, so the target website will see very few requests from the same IP address and requests from the same IP address will have a long delay between them - See more at: http://extract-web-data.com/scrape-detection-and-how-visual-web-ripper-can-help-deal-with-this-problem/#sthash.PbtlXaWa.dpuf

http://extract-web-data.com/scrape-detection-and-how-visual-web-ripper-can-help-deal-with-this-problem/

jhs2jhs commented 10 years ago

http://www.imperva.com/docs/wp_detecting_and_blocking_site_scraping_attacks.pdf

jhs2jhs commented 10 years ago

wikipedia: Technical measures to stop bots http://en.wikipedia.org/wiki/Web_scraping#Technical_measures_to_stop_bots

jhs2jhs commented 10 years ago

a survey of web scraping clustering engines. https://s3-us-west-2.amazonaws.com/mlsurveys/89.pdf

jhs2jhs commented 10 years ago

http://stackoverflow.com/questions/4868331/could-a-web-scraper-get-around-a-good-throttle-protection/4871249#4871249

jhs2jhs commented 10 years ago

http://proxy.org/ web based proxy

jhs2jhs commented 10 years ago

http://yacy.net/en/index.html

jhs2jhs commented 10 years ago

anonymous proxies

jhs2jhs commented 10 years ago

http://hidemyass.com/proxy-list/search-225390