SILKNOW crawler that collects metadata records describing silk material from various museums.
You first need to install dependencies, by using npm:
npm install
The crawler takes one paramater: the name of the museum to be crawled. For example:
npm start -- mfa-boston
Available parameters:
Parameter | Description |
---|---|
--no-files | Do not download files such as photos |
--no-records | Do not write the JSON records |
--list-fields | Returns a list of unique fields from JSON records. Also takes a --format parameter (values: "md" or "markdown" for Markdown, "json" for JSON, defaults to Markdown) |
--check-images | Re-download images marked with the hasError flag |
artic
- Art Institute of Chicagoceres-mcu
- Red Digital de Colecciones de Museos de Españael-tesoro
- Museo de Arte Sacro El Tesoro de la Concepcióneuropeana
- Europeanagallica
- Gallicagarin
- Garín 1820imatex
- Centre de Documentació i Museu Tèxtiljoconde
- Joconde Database of French Museum Collectionsles-arts-decoratifs
- Musée des Arts Décoratifsmet-museum
- The Metropolitan Museum of Artmfa-boston
- Boston Museum of Fine Artsmobilier-international
- Collection of the Mobilier national in Francemtmad
- Musée des Tissusparis-musees
- Paris Muséesrisd-museum
- Rhode Island School of Design Museumsmithsonian
- Smithsonianunipa
- Sicily Cultural Heritagevam
- Victoria and Albert Museumvenezia
- Musei di Veneziaversailles
- VersaillesCrawled JSON structure of each museum can be found here
The UNIPA crawler parses local files only. It requires a database.json along with an images folder. The data has to be stored in data/unipa/resources
.
Link to the dataset: https://www.dropbox.com/sh/a8zzv22r59q67eq/AAB4SOAGf1byLFwakYkzbcYFa?dl=0
The Paris Musées API requires to generate a token by following the Paris Musées API documentation.
Once a token has bene obtained, add the environment variable PARIS_MUSEES_TOKEN=<token>
(replace <token>
with the token) before running the crawler.
MET Museum implements an anti-scrapping strategy which requires to first open this page into a web browser, then open the browser's inspector and type in the console: document.cookie
to get the cookies. It should look like this: "incap_ses_XXX_XXXXXXX=abcDEFgHIjkLmNoPQrSTUvWxyZABCDEFGHijklMNOPqrSTUVwXYZAb=="
.
Finally, add the environment variable MET_MUSEUM_COOKIE="incap_ses_XXX_XXXXXXX=abcDEFgHIjkLmNoPQrSTUvWxyZABCDEFGHijklMNOPqrSTUVwXYZAb=="
(replace with your own cookie) before running the crawler.
This cookie is only valid for a limited amount of time, but it should be enough to crawl the entire collection.
The Musée d'Art et d'Industrie (St Etienne) crawler parses local files only. It requires a export silknow.tsv
file along with an media
folder. The data has to be stored in data/musee-st-etienne/resources
.
Link to the dataset: https://drive.google.com/drive/folders/1V-p9cJ-lNtUtGHW1ePv_k4rLsd_xbbyb
Add the environment variable DEBUG=silknow:*
to also output the debug logs.