neolee / wop-community

29 stars 19 forks source link

实例二:无法抓取英文版维基百科页面 #308

Closed wang-yulin closed 3 years ago

wang-yulin commented 3 years ago

我在爬取英文维基百科页面时,出现了下面的错误。网页能够正常访问这个页面,而且爬虫程序能够抓取中文版的页面。Google了一番,尝试过更换User-Agent,也尝试添加cookie,但都不起作用。还请老师帮忙诊断下是会是哪个方面出问题了呢。(我电脑设置有科学上网的代理)


def scrap(url):
    headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36'}
    cookies = {"cookie":"GeoIP=KR:11:Seoul:37.60:126.98:v4; enwikimwuser-sessionId=858c28bc353699db958c; WMF-Last-Access=25-Dec-2020; WMF-Last-Access-Global=25-Dec-2020"}
    response = requests.get(url=url, headers=headers, cookies=cookies)
    if response.status_code == 200:
        payload = response.content
        soup = BeautifulSoup(payload, "html.parser")
        return soup

url = "https://en.wikipedia.org/wiki/IAU_designated_constellations"
scrap(url)

---------------------------------------------------------------------------------
ConnectionError: ('Connection aborted.', TimeoutError(60, 'Operation timed out'))
neolee commented 3 years ago

Wikipedia 需要梯子,而且是对你运行 jupyter lab 的命令行有效的梯子才行。

wang-yulin commented 3 years ago

那看来是我这个梯子不够高级,先不在这纠结了,谢谢老师