AttributeError: 'PyppeteerResponse' object has no attribute 'html'

lucays commented 2 years ago

直接新建文件运行示例代码，鼠标选中部分response.html这一步会报错： AttributeError: 'PyppeteerResponse' object has no attribute 'html'

debug发现确实没有这个属性

howie6879 commented 2 years ago

应该是我文档没有更新，你使用 await response.text() 试试看

lucays commented 2 years ago

应该是我文档没有更新，你使用 await response.text() 试试看

额，也不行，item里的_get_html()方法需要这个是str，但是.text或者.text()都不是，括号内用html=await response.text()?... 即使这样也会报错： pyppeteer.errors.NetworkError: Protocol Error (Network.getResponseBody): Session closed. Most likely the page has been closed. 这个报错也许只是pyppeteer本身的问题了。。

howie6879 commented 2 years ago

我明天调试一下哈

howie6879 commented 2 years ago

@lucays 已修复：

pip install ruia-pyppeteer==0.0.8

代码：

from ruia import AttrField, Item, TextField

from ruia_pyppeteer import PyppeteerSpider as Spider

class JianshuItem(Item):
    target_item = TextField(css_select="ul.list>li")
    author_name = TextField(css_select="a.name")
    author_url = AttrField(attr="href", css_select="a.name")

    async def clean_author_name(self, author_name):
        return author_name.strip()

    async def clean_author_url(self, author_url):
        return f"https://www.jianshu.com{author_url}"

class JianshuSpider(Spider):
    start_urls = ["https://www.jianshu.com/"]
    concurrency = 10

    async def parse(self, response):
        html = await response.page.content()
        async for item in JianshuItem.get_items(html=html):
            # Loading js by using PyppeteerRequest
            print(item)
        await response.browser.close()

if __name__ == "__main__":
    JianshuSpider.start()

输出：

lucays commented 2 years ago

@lucays 已修复：

pip install ruia-pyppeteer==0.0.8

代码：

from ruia import AttrField, Item, TextField

from ruia_pyppeteer import PyppeteerSpider as Spider

class JianshuItem(Item):
    target_item = TextField(css_select="ul.list>li")
    author_name = TextField(css_select="a.name")
    author_url = AttrField(attr="href", css_select="a.name")

    async def clean_author_name(self, author_name):
        return author_name.strip()

    async def clean_author_url(self, author_url):
        return f"https://www.jianshu.com{author_url}"

class JianshuSpider(Spider):
    start_urls = ["https://www.jianshu.com/"]
    concurrency = 10

    async def parse(self, response):
        html = await response.page.content()
        async for item in JianshuItem.get_items(html=html):
            # Loading js by using PyppeteerRequest
            print(item)
        await response.browser.close()

if __name__ == "__main__":
    JianshuSpider.start()

输出：

测试确实已修复，非常感谢！有个小问题目前0.0.8还没上传到pypi，需要 pip install git+https://github.com/ruia-plugins/ruia-pyppeteer

另外就是，是否可以不手动close...有with就更好了

howie6879 commented 2 years ago

有个小问题目前0.0.8还没上传到pypi

已经上传的，可能你用了国内源

不手动close...有with

这个不满足实际使用条件的

python-ruia / ruia-pyppeteer

AttributeError: 'PyppeteerResponse' object has no attribute 'html' #11