scrapelect
is a web scraping language inspired by CSS that turns
a web page into structured JSON data. Select elements with CSS
selectors, apply filters to extract and modify the data you want from
a web page, and get the output in a structured, machine-readable,
interoperable format.
Install the Rust toolchain. Using cargo
,
run:
$ cargo install scrapelect
to install the scrapelect
interpreter.
Write a scrapelect
program into a .scrp
file. Documentation
for the language can be found in the scrapelect
book.
A quick example, title.scrp
, retrieves the title of a Wikipedia article:
title: .mw-page-title-main {
content: $element | text();
};
Run the scrp
with the URL of the web page to scrape:
$ scrapelect title.scrp "https://en.wikipedia.org/wiki/Cat"
It will output:
{
"title": {
"content": "Cat"
}
}
scrapelect
book
contains documentation on language concepts and how to write a scrapelect
program.scrapelect
scrapelect
book for more information on contributing to scrapelect
.scrapelect
is available under the MIT or Apache 2 licenses, at your
option. Copies of these licenses are included at
LICENSE-MIT and
LICENSE-APACHE
at the root directory.
scrapelect: scrape + select, also -lect