Tool for archiving and exploring.
Built out of a need to get out of walled gardens of Pinterest and (much less walled) Pinboard.
Alpha quality at best.
Archivist is built out of three interconnected parts (each package has it's own readme file):
archivist-cli
- command line tool for configuration, fetching and querying the dataarchivist-ui
- Electron UI built on top of archivist-cli
archivist-*
- various crawlers, "official" ones:
archivist-pinboard
- API-based Pinboard archiving: screenshot and freeze-dry of the original websitearchivist-pinterest-crawl
- slowly crawl through Pinterest and archive pin imagenpm install -g archivist-cli
npm install -g archivist-pinboard
npm install -g archivist-pinterest-crawl
archivist-ui
is not on npm (it should probably be a downloadable dmg
, but I didn't get around to it), so to generate the .app
and put it in /Applications/
yourself:
cd archivist-ui && ./scripts/install.sh
$ archivist config
Config is a JSON object of shape:
{
"crawler-1": CRAWLER_1_OPTIONS,
"crawler-2": CRAWLER_2_OPTIONS,
...
}
Example config (assuming Pinboard and Pinterest backup):
{
"archivist-pinterest-crawl": {
"loginMethod": "cookies",
"profile": "szymon_k"
},
"archivist-pinboard": {
"apiKey": "API_KEY_FOR_PINBOARD"
}
}
archivist-pinterest-crawl
supports two login methods: "cookies"
(which uses cookies from local Google Chrome installation) or "password"
which requires plaintext username and password:
"archivist-pinterest-crawl": {
"loginMethod": "password",
"username": "PINTEREST_USERNAME",
"password": "PINTEREST_PASSWORD",
"profile": "szymon_k"
},
archivist-pinboard
requires API Token
from https://pinboard.in/settings/password to run properly.
archivist fetch
(might take a long time depending on the size of the archive)archivist query
archivist query keyboard
query
by default returns ndjson
, normal JSON can be outputed using --json