在爬取的过程中经常下载了4条以后会无法继续下载

ifwind commented 3 years ago

在爬取的过程中经常下载了4条以后会无法继续下载，会报以下错误，查看网址就说指定了无效的检索式 [scrapy.core.scraper] ERROR: Spider error processing <GET http://apps.webofknowledge.com/summary.do;jsessionid=F1BE23EC2AB3476A597A0064E713EE96?message_key=Server.invalidInput&error_display_redirect=true&message_mode=AdvancedSearch&product=WOS&search_mode=AdvancedSearch&doc=1&qid=13448&SID=6Bk8rLEzBPVZHDUhciI> (referer: http://apps.webofknowledge.com/summary.do;jsessionid=F1BE23EC2AB3476A597A0064E713EE96?product=WOS&doc=1&qid=13448&SID=6Bk8rLEzBPVZHDUhciI&search_mode=AdvancedSearch&update_back2search_link_param=yes) Traceback (most recent call last): File "D:\anaconda3\lib\site-packages\twisted\internet\defer.py", line 654, in _runCallbacks current.result = callback(current.result, *args, **kw) File "G:\chenyu\wos_crawler-dev\wos_crawler\spiders\wos_advanced_query_spider.py", line 212, in download_result file_postfix = re.search(file_postfix_pattern, response.headers[b'Content-Disposition'].decode()) File "D:\anaconda3\lib\site-packages\scrapy\http\headers.py", line 40, in __getitem__ return super().__getitem__(key)[-1] File "D:\anaconda3\lib\site-packages\scrapy\utils\datatypes.py", line 23, in __getitem__ return dict.__getitem__(self, self.normkey(key)) KeyError: b'Content-Disposition'

ifwind commented 3 years ago

直接使用GUI下载的话就只显示下载成功，然后下载路径没有任何的文件

tomleung1996 commented 3 years ago

你好，请确认检索式的正确性，确保在网页wos上面也有大于0的结果

发自我的iPhone

------------------ 原始邮件 ------------------ 发件人: ifwind <notifications@github.com> 发送时间: 2021年1月11日 23:09 收件人: tomleung1996/wos_crawler <wos_crawler@noreply.github.com> 抄送: Subscribed <subscribed@noreply.github.com> 主题: 回复：[tomleung1996/wos_crawler] 在爬取的过程中经常下载了4条以后会无法继续下载 (#20)

直接使用GUI下载的话就只显示下载成功，然后下载路径没有任何的文件

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

ifwind commented 3 years ago

你好，请确认检索式的正确性，确保在网页wos上面也有大于0的结果发自我的iPhone … ------------------ 原始邮件 ------------------ 发件人: ifwind <notifications@github.com> 发送时间: 2021年1月11日 23:09 收件人: tomleung1996/wos_crawler <wos_crawler@noreply.github.com> 抄送: Subscribed <subscribed@noreply.github.com> 主题: 回复：[tomleung1996/wos_crawler] 在爬取的过程中经常下载了4条以后会无法继续下载 (#20) 直接使用GUI下载的话就只显示下载成功，然后下载路径没有任何的文件 — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

感谢回复！我使用API时的检索式为SO=(a) AND PY=(2003-2019)，确实是正确的检索式，因为能成功下载4个文件的记录也就是2000条，但每次下载到2000条后就无法再下载了

tomleung1996 commented 3 years ago

您好，我不认为so=(a)是合理的检索式

发自我的iPhone

------------------ 原始邮件 ------------------ 发件人: ifwind <notifications@github.com> 发送时间: 2021年1月12日 16:16 收件人: tomleung1996/wos_crawler <wos_crawler@noreply.github.com> 抄送: TomLeung <tomleung1996@qq.com>, Comment <comment@noreply.github.com> 主题: 回复：[tomleung1996/wos_crawler] 在爬取的过程中经常下载了4条以后会无法继续下载 (#20)

tomleung1996 / wos_crawler

在爬取的过程中经常下载了4条以后会无法继续下载 #20