shishaochen / PbcCrawler

一个简单的爬虫,抓取中国人民银行官网货币政策司的公开市场业务交易公告。
8 stars 3 forks source link

央行js脚本更改,js重定向出现了问题 #2

Open DC-Melo opened 3 years ago

DC-Melo commented 3 years ago

Traceback (most recent call last): File "crawl.py", line 63, in main(FLAGS) File "crawl.py", line 38, in main text = get_url_content(list_page_url) File "/home/dc/2P/081_jvm/springboot_webmagic_spider/springboot_webmagic_spider/PbcCrawler/pbc_http.py", line 56, in get_url_content page_url = _generate_new_url(e0.html) File "/home/dc/2P/081_jvm/springboot_webmagic_spider/springboot_webmagic_spider/PbcCrawler/pbc_http.py", line 45, in _generate_new_url tail = ctx.call('getURL') File "/home/dc/.local/lib/python3.6/site-packages/execjs/_abstract_runtime_context.py", line 37, in call return self._call(name, *args) File "/home/dc/.local/lib/python3.6/site-packages/execjs/_external_runtime.py", line 92, in _call return self._eval("{identifier}.apply(this, {args})".format(identifier=identifier, args=args)) File "/home/dc/.local/lib/python3.6/site-packages/execjs/_external_runtime.py", line 78, in eval return self.exec(code) File "/home/dc/.local/lib/python3.6/site-packages/execjs/_abstract_runtimecontext.py", line 18, in exec return self.exec(source) File "/home/dc/.local/lib/python3.6/site-packages/execjs/_external_runtime.py", line 88, in exec return self._extract_result(output) File "/home/dc/.local/lib/python3.6/site-packages/execjs/_external_runtime.py", line 167, in _extract_result raise ProgramError(value) execjs._exceptions.ProgramError: TypeError: window[_0x4ce3("81", "Fb!7")] is not a function

shishaochen commented 3 years ago

@DC-Melo 是的,看起来网站作了进一步的混淆。我最近没有抓取的需求了,可能不会花时间分析。你如果有最新的解析方法,欢迎发 PR 来更新。

DC-Melo commented 3 years ago

它使用的是_0xodK='jsjiami.com.v6'下一代的加密,由于对js不够熟悉,就把我给难住了。