wcong / ants-go

open source, distributed, restful crawler engine in golang
MIT License
363 stars 124 forks source link

muiltiply_spider runtime error #17

Closed shionryuu closed 9 years ago

shionryuu commented 9 years ago

An error has occurred after I run the muiltiply_spider several times.

2015/04/22 14:54:03 http.go:41: get request:/crawl                                        
2015/04/22 14:54:03 http.go:91: start spider: muiltiply_spider                            
2015/04/22 14:54:03 downloader.go:46: start downloader                                    
2015/04/22 14:54:03 scraper.go:47: start scraper                                          
2015/04/22 14:54:03 distributer.go:74: start distributer                                  
2015/04/22 14:54:03 distributer.go:90: muiltiply_spider :distribute: 192.168.206.128:8300 :request: http://www.baidu.com/s?wd=1                                                     
2015/04/22 14:54:03 report.go:80: start reporter                                          
2015/04/22 14:54:04 downloader.go:96: muiltiply_spider depth: 0 download url: http://www.baidu.com/s?wd=1                                                                           
2015/04/22 14:54:14 downloader.go:103: Get http://www.baidu.com/s?wd=1: read tcp 180.76.3.151:80: use of closed network connection                                                  
2015/04/22 14:54:14 scraper.go:91: muiltiply_spider :start to scrapy: http://www.baidu.com/s?wd=1                                                                                   
2015/04/22 14:54:14 scraper.go:95: muiltiply_spiderruntime error: invalid memory address or nil pointer dereference                                                                 
2015/04/22 14:54:15 report.go:101: muiltiply_spider :report request to master: http://www.baidu.com/s?wd=1                                                                          
2015/04/22 14:54:15 report.go:109: stop reporter                                          
2015/04/22 14:54:16 distributer.go:97: stop distributer  
wcong commented 9 years ago

it happens when the website you are crawling close your tcp connection. muiltiply_spider create about 10 requests after scrape one request. So,ants-go will send enough request to make website close your further tcp connection. I add DownloadInterval In setting where you can slow down downloader. you can type -h to get help information for DownloadInterval.

shionryuu commented 9 years ago

@wcong OK, thanks.