scrapeghost
is an experimental library for scraping websites using OpenAI's GPT.
Source: https://github.com/jamesturk/scrapeghost
Documentation: https://jamesturk.github.io/scrapeghost/
Issues: https://github.com/jamesturk/scrapeghost/issues
Use at your own risk. This library makes considerably expensive calls ($0.36 for a GPT-4 call on a moderately sized page.) Cost estimates are based on the OpenAI pricing page and not guaranteed to be accurate.
The purpose of this library is to provide a convenient interface for exploring web scraping with GPT.
While the bulk of the work is done by the GPT model, scrapeghost
provides a number of features to make it easier to use.
Python-based schema definition - Define the shape of the data you want to extract as any Python object, with as much or little detail as you want.
Preprocessing
Postprocessing
pydantic
schema to validate the response.Cost Controls