psf / requests-html

Pythonic HTML Parsing for Humans™
http://html.python-requests.org
MIT License
13.72k stars 978 forks source link

what's the different between the requests-html and none-JavaScript browser? #179

Open rsyqvthv opened 6 years ago

rsyqvthv commented 6 years ago

I assume requests-html should work as a none-javascript browser, if I don't use render(). Correct me if I am wrong.

so, I met this problem with url='http://www.mysupermarket.co.uk/shelves/PersonalOffers_my_Top_Offers_in_ASDA.html':

when I open this url in a new incognito window (js disabled) to check the source code: view-source:http://www.mysupermarket.co.uk/shelves/PersonalOffers_my_Top_Offers_in_ASDA.html, it shows me a stack of js code, but when I refresh this page, and I can see the source code which I expected. after this processing, I can easily visit any pages in this site without seen the 'stack of js code' again, which I guess I have some cookies been set up in session.

but in requests-html, it's different, when using get to retrieve page first time, r=session.get(url), the r.html.html backs the same as none-javascripted browser, but when I tried 2nd retrieve, it backs me a different source, which basically shows me 'page does exist': <span class="Title2">Sorry we couldn't find the page you requested.</span>

here is the full code I tried (yes, I know it looks silly, but I don't know how to do the refresh in request-html):

r=session.get(url)
r=session.get(url)
print(r.html.html)

so, what am asking is, how can I deal with this in requests-html to retrieve the correct source I need? and I don't want to use render() as the site seems support none-js visit.

lmiguelvargasf commented 5 years ago

Closed due to inactivity and it also seems not relevant.

oldani commented 5 years ago

Hi @2anyone

I have checked the website. Actually, the site does not support none-js visit, i have tested it my self disabling js for the site. So if you want to access you would need to call render at lest once.