Open StefanHri opened 1 year ago
I have the same issue with the following URL: https://www.bcliquorstores.com/product-catalogue
from requests_html import HTMLSession, AsyncHTMLSession
headers = {
'Host': 'www.bcliquorstores.com',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/111.0',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8',
'Accept-Language': 'en-US,en;q=0.7,fa;q=0.3',
}
asession = AsyncHTMLSession()
r = await asession.get(url, headers=headers)
await r.html.arender()
res = r.html.html
I created a small tool to somewhat test stuff like this, since issues like this are not uncommon and some of them are bound not to be the fault of the library, it's the nature of the crawling world, sometimes in order for your request to go through you have to apply different techniques, like User Agent Spoofing, HTTP2, TLS Spoofing, Proxies.. etc
You can see that the response length it's identical meaning that it's js rendering part (pypeteer).
I will keep investigating, but bare in mind that SPA's usually get their data from other APIs, it might interesting to see if those APIs are available for the public 😉.
I was having the same issue and found out that it was an issue with pyppeteer using (very) old version of Chromium. Once I upgraded the Chromium browser things worked as expected.
Let me know if this works.
Same issue here
Hi
I have the following code:
which prints:
I am interested in the content of body but it looks like it is not rendered correctly. What I am doing wrong?