python-ruia / ruia-pyppeteer

A Ruia plugin for loading javascript - pyppeteer
MIT License
18 stars 5 forks source link

do you know how to build a correct request to this website ?? #3

Closed marcusau closed 5 years ago

marcusau commented 5 years ago

this website must reject my request application whatever page options i have set

domain_page = 'http://stock.hexun.com/7x24h/'

class news_Item(Item):
    target_item = TextField(css_select='div.liveNews')
    publish_times = TextField(css_select="dl.newsDl.clearfix > dt:nth-child(1)", many=True)
    news_contents = TextField(css_select="dl.newsDl.clearfix > dd:nth-child(2)", many=True)

async def test():
    pyppeteer_page_options = {'waitUntil': 'networkidle2','timeout': 0}
    request = Request(domain_page, pyppeteer_page_options=pyppeteer_page_options)
    response = await request.fetch()
    item = await news_Item.get_item(html=response.html)
    for publish_time,text in zip(item.publish_times,item.news_contents):

        print(publish_time,text)

if __name__ == '__main__':
    asyncio.get_event_loop().run_until_complete(test())

Thanks a lot

howie6879 commented 5 years ago

Maybe you were banned by the target site, it's not this project's fault

marcusau commented 5 years ago

Maybe you were banned by the target site, it's not this project's fault

If my IP is blocked, i cannot open this website on my chrome brower right?

However, the website runs normally on my Chrome now.

The spider cannot get access on it.

howie6879 commented 5 years ago

Ruia-pyppeteer just provides a way to load js, and I can't fix the problem for you without the error log