JavaScript Enabling - Githubissues

jamesturk / scrapeghost

👻 Experimental library for scraping websites using OpenAI's GPT API.

https://jamesturk.github.io/scrapeghost/

Other

1.43k stars 87 forks source link

JavaScript Enabling #56

Closed d-pizhuk closed 1 year ago

d-pizhuk commented 1 year ago

Some websites need js to be enabled to get HTML content. In scrapers.py you use Requests API to get content: But with the first website I tried I got: I used library "playwright" to fix it:

jamesturk commented 1 year ago

The scrape method takes HTML for this reason, see this FAQ:

https://jamesturk.github.io/scrapeghost/faq/#can-i-use-httpx-or-seleniumplaywright-can-i-customize-the-headers-etc