mediachain / apps

Discussion and documentation of apps built on Mediachain
15 stars 5 forks source link

Museums on Mediachain #5

Open denisnazarov opened 7 years ago

denisnazarov commented 7 years ago

Now that 1.0 is live, it is trivial to ingest the world's cultural heritage into Mediachain! See the the documentation here.

Goal of this issue is to list candidate datasets for the community to tackle ingesting. Lets produce a writeup for each outlining the steps you took to extract the data and ingest it into Mediachain. This will serve as a great tutorial for ingesting/processing different data sources to co-exist in Mediachain!

Please comment below if you want to help. If you help ingest a dataset, your name will be listed as a contributor in the node's info!

Feel free to add more sources and include missing links.

marionzualo commented 7 years ago

@denisnazarov I can help with the Tate.

aisaac commented 7 years ago

@denisnazarov , @hugomanguinhas and I can help with Europeana. But we probably can't do more than giving dumps and some advice on what to do with the data. No direct ingestion action from us now, I'm afraid.

denisnazarov commented 7 years ago

@aisaac @hugomanguinhas awesome news! please post the dumps here and hopefully someone from the community will give it a shot.

UPDATE: MoMA and Rijksmuseum data is up and running! The museum node now has over 600k records.

$ mcclient listPeers -i
QmeiY2eHMwK92Zt6X4kUUC3MsjMmVb2VnGZ17DhnhRPCEQ -- Metadata for CC images from DPLA, 500px, and pexels; operated by Mediachain Labs.
QmZ6dckUhRouVr6AsBTpK6vMLVpcz1KAeJAJVQEZQ5gCek -- Metadata for CC images from flickr; operated by Mediachain Labs.
QmTVsocFCSEdPyM8dZ734GRhjvvmYyL9fShyezVkbPj17E -- Curated Museum Metadata; Operated by Mediachain Labs
$ mcclient query -r QmTVsocFCSEdPyM8dZ734GRhjvvmYyL9fShyezVkbPj17E "SELECT namespace FROM *"
"mediachain.schemas"
"museums.moma.artists"
"museums.moma.artworks"
"museums.rijksmuseum.artworks"
$ mcclient query -r QmTVsocFCSEdPyM8dZ734GRhjvvmYyL9fShyezVkbPj17E "SELECT COUNT(*) FROM museums.*"
659741
// merge a MoMA record to your local node
$ mcclient merge QmTVsocFCSEdPyM8dZ734GRhjvvmYyL9fShyezVkbPj17E "SELECT * FROM museums.moma.artworks LIMIT 1"
// look at the metadata object
$ mcclient getData QmWwr4ZfeLJfbWNAuCQfefwo1aHtxC5yjyU8C5WG4DYrYe
{
  "schema": {
    "/": "QmXF4LR4QkuRVh3WQbB56seTX2aPm3Tz7b4Y8heoLAiTkk"
  },
  "data": {
    "AccessionNumber": "885.1996",
    "Artist": [
      "Otto Wagner"
    ],
    "ArtistBio": [
      "Austrian, 1841\u20131918"
    ],
    "BeginDate": [
      1841
    ],
    "Cataloged": "Y",
    "Classification": "Architecture",
    "ConstituentID": [
      6210
    ],
    "CreditLine": "Fractional and promised gift of Jo Carole and Ronald S. Lauder",
    "Date": "1896",
    "DateAcquired": "1996-04-09",
    "Department": "Architecture & Design",
    "Dimensions": "19 1/8 x 66 1/2\" (48.6 x 168.9 cm)",
    "EndDate": [
      1918
    ],
    "Gender": [
      "Male"
    ],
    "Height (cm)": 48.6,
    "Medium": "Ink and cut-and-pasted painted pages on paper",
    "Nationality": [
      "Austrian"
    ],
    "ObjectID": 2,
    "ThumbnailURL": "http://www.moma.org/media/W1siZiIsIjU5NDA1Il0sWyJwIiwiY29udmVydCIsIi1yZXNpemUgMzAweDMwMFx1MDAzZSJdXQ.jpg?sha=137b8455b1ec6167",
    "Title": "Ferdinandsbr\u00fccke Project, Vienna, Austria, Elevation, preliminary version",
    "URL": "http://www.moma.org/collection/works/2",
    "Width (cm)": 168.9
  }
}

Tutorials We've got some early documentation on automatically generating a schema and ingesting data up! Working on fleshing it out in more depth today.

aisaac commented 7 years ago

@denisnazarov cc @hugomanguinhas ok cool stuff. let's see how to proceed on this somewhere next week!

denisnazarov commented 7 years ago

MoMA tutorial is up here. Please try it and let us know if you have any issues or questions!

diane7C8J commented 7 years ago

Nationalmuseum releases 3,000 images on Wikimedia Commons http://www.nationalmuseum.se/wikimediacommonseng

diane7C8J commented 7 years ago

More than 25,000 images of artworks on the SMK website are in the Public Domain http://www.smk.dk/en/use-of-images-and-text/free-download-of-artworks/

denisnazarov commented 7 years ago

@DianeDrubay awesome, keep em coming! added to list above

denisnazarov commented 7 years ago

The Cooper Hewitt dataset it now live!

$ mcclient query "SELECT namespace FROM museums.*"
"museums.cooperhewitt.objects"
"museums.moma.artists"
"museums.moma.artworks"
"museums.rijksmuseum.artworks"

I wrote down the steps I took to ingest it here.

denisnazarov commented 7 years ago

Big thanks to @marionzualo and @pyython for adding data from Tate and Brooklyn museum, respectively!

$ mcclient query -r QmTVsocFCSEdPyM8dZ734GRhjvvmYyL9fShyezVkbPj17E "SELECT namespace FROM *"
"mediachain.schemas"
"museums.brooklynmuseum.artists"
"museums.brooklynmuseum.collections"
"museums.brooklynmuseum.exhibitions"
"museums.brooklynmuseum.geographicallocations"
"museums.brooklynmuseum.museumlocations"
"museums.brooklynmuseum.objects"
"museums.cooperhewitt.objects"
"museums.moma.artists"
"museums.moma.artworks"
"museums.rijksmuseum.artworks"
"museums.tate.artists"
"museums.tate.artworks"

@marionzualo also made an awesome write up documenting his steps to publish Tate data here.

The museum node is growing to be quite a collection! 💥

hugomanguinhas commented 7 years ago

Hi all,

I generated a dump of the metadata for 3 of our collections:

Here is the file: https://www.dropbox.com/sh/0f8t6xxlvtbdvlt/AABAjRF52z_IXXa5B0emFWOna?dl=0

It's a zip file containing one file per metadata record formatted in JSON-LD... I can also generate in other RDF formats, or using a specific JSON-LD context if it makes it easier to process. Btw, since DPLA also uses EDM, it could make it easier if our dumps follow the same format. Does anyone happen to know?

Best regards, Hugo

IsabelleReusa commented 7 years ago

An API here for data of 370,000 artworks from French national museum : Louvre, Orsay, Versailles, Pompidou, Renaissance museum, Middle Ages museum, Guimet for Asian Arts, Sèvres and Limoges for porcelain collections... data in FR + EN languages, linked to authority ids and wikipedia pages, places are geolocalised.

IsabelleReusa commented 7 years ago

Also an image fund (in French only) of 54,000+ images of the collection of Albert Kahn : photographs from the end of XIXth century-beginning of XXth of around the world. beautiful stuff in open data. Here