jamesturk / scrapeghost

👻 Experimental library for scraping websites using OpenAI's GPT API.
https://jamesturk.github.io/scrapeghost/
Other
1.42k stars 86 forks source link
gpt openai-api webscraping

scrapeghost

scrapeghost logo

scrapeghost is an experimental library for scraping websites using OpenAI's GPT.

Source: https://github.com/jamesturk/scrapeghost

Documentation: https://jamesturk.github.io/scrapeghost/

Issues: https://github.com/jamesturk/scrapeghost/issues

PyPI badge Test badge

Use at your own risk. This library makes considerably expensive calls ($0.36 for a GPT-4 call on a moderately sized page.) Cost estimates are based on the OpenAI pricing page and not guaranteed to be accurate.

Features

The purpose of this library is to provide a convenient interface for exploring web scraping with GPT.

While the bulk of the work is done by the GPT model, scrapeghost provides a number of features to make it easier to use.

Python-based schema definition - Define the shape of the data you want to extract as any Python object, with as much or little detail as you want.

Preprocessing

Postprocessing

Cost Controls