Open rsyqvthv opened 6 years ago
Closed due to inactivity and it also seems not relevant.
Hi @2anyone
I have checked the website. Actually, the site does not support none-js visit, i have tested it my self disabling js for the site. So if you want to access you would need to call render
at lest once.
I assume requests-html should work as a none-javascript browser, if I don't use render(). Correct me if I am wrong.
so, I met this problem with
url='http://www.mysupermarket.co.uk/shelves/PersonalOffers_my_Top_Offers_in_ASDA.html'
:when I open this url in a new incognito window (js disabled) to check the source code:
view-source:http://www.mysupermarket.co.uk/shelves/PersonalOffers_my_Top_Offers_in_ASDA.html
, it shows me a stack of js code, but when I refresh this page, and I can see the source code which I expected. after this processing, I can easily visit any pages in this site without seen the 'stack of js code' again, which I guess I have some cookies been set up in session.but in requests-html, it's different, when using get to retrieve page first time,
r=session.get(url)
, ther.html.html
backs the same as none-javascripted browser, but when I tried 2nd retrieve, it backs me a different source, which basically shows me 'page does exist':<span class="Title2">Sorry we couldn't find the page you requested.</span>
here is the full code I tried (yes, I know it looks silly, but I don't know how to do the refresh in request-html):
so, what am asking is, how can I deal with this in requests-html to retrieve the correct source I need? and I don't want to use render() as the site seems support none-js visit.