Closed lucays closed 2 years ago
应该是我文档没有更新,你使用 await response.text()
试试看
应该是我文档没有更新,你使用
await response.text()
试试看
额,也不行,item里的_get_html()方法需要这个是str,但是.text或者.text()都不是,括号内用html=await response.text()?... 即使这样也会报错: pyppeteer.errors.NetworkError: Protocol Error (Network.getResponseBody): Session closed. Most likely the page has been closed. 这个报错也许只是pyppeteer本身的问题了。。
我明天调试一下哈
@lucays 已修复:
pip install ruia-pyppeteer==0.0.8
代码:
from ruia import AttrField, Item, TextField
from ruia_pyppeteer import PyppeteerSpider as Spider
class JianshuItem(Item):
target_item = TextField(css_select="ul.list>li")
author_name = TextField(css_select="a.name")
author_url = AttrField(attr="href", css_select="a.name")
async def clean_author_name(self, author_name):
return author_name.strip()
async def clean_author_url(self, author_url):
return f"https://www.jianshu.com{author_url}"
class JianshuSpider(Spider):
start_urls = ["https://www.jianshu.com/"]
concurrency = 10
async def parse(self, response):
html = await response.page.content()
async for item in JianshuItem.get_items(html=html):
# Loading js by using PyppeteerRequest
print(item)
await response.browser.close()
if __name__ == "__main__":
JianshuSpider.start()
输出:
@lucays 已修复:
pip install ruia-pyppeteer==0.0.8
代码:
from ruia import AttrField, Item, TextField from ruia_pyppeteer import PyppeteerSpider as Spider class JianshuItem(Item): target_item = TextField(css_select="ul.list>li") author_name = TextField(css_select="a.name") author_url = AttrField(attr="href", css_select="a.name") async def clean_author_name(self, author_name): return author_name.strip() async def clean_author_url(self, author_url): return f"https://www.jianshu.com{author_url}" class JianshuSpider(Spider): start_urls = ["https://www.jianshu.com/"] concurrency = 10 async def parse(self, response): html = await response.page.content() async for item in JianshuItem.get_items(html=html): # Loading js by using PyppeteerRequest print(item) await response.browser.close() if __name__ == "__main__": JianshuSpider.start()
输出:
测试确实已修复,非常感谢! 有个小问题目前0.0.8还没上传到pypi,需要 pip install git+https://github.com/ruia-plugins/ruia-pyppeteer
另外就是,是否可以不手动close...有with就更好了
有个小问题目前0.0.8还没上传到pypi
已经上传的,可能你用了国内源
不手动close...有with
这个不满足实际使用条件的
直接新建文件运行示例代码,鼠标选中部分response.html这一步会报错: AttributeError: 'PyppeteerResponse' object has no attribute 'html'
debug发现确实没有这个属性