jamesturk / scrapeghost

👻 Experimental library for scraping websites using OpenAI's GPT API.
https://jamesturk.github.io/scrapeghost/
Other
1.43k stars 87 forks source link

JavaScript Enabling #56

Closed d-pizhuk closed 1 year ago

d-pizhuk commented 1 year ago

Some websites need js to be enabled to get HTML content. In scrapers.py you use Requests API to get content: image But with the first website I tried I got: image I used library "playwright" to fix it: image

jamesturk commented 1 year ago

The scrape method takes HTML for this reason, see this FAQ:

https://jamesturk.github.io/scrapeghost/faq/#can-i-use-httpx-or-seleniumplaywright-can-i-customize-the-headers-etc