tategallery / collection

Tate Collection metadata
Creative Commons Zero v1.0 Universal
506 stars 187 forks source link

Reconciling places with Getty thesaurus of geographic names #23

Open ajtucker opened 9 years ago

ajtucker commented 9 years ago

Hi,

Most of the artists have a birth, death and active places in their records, and the information given suggests that the Getty Thesaurus of Geographic Names (TGN) has been used as a controlled vocabulary.

Now that the Getty TGN has been released as RDF, I've been able to link up most of the place names in the JSON artist records against the Getty TGN in order to figure out the corresponding URIs for the places, and hence get more information, e.g. lat/lon, wider/narrower names of areas, etc. This is a great help when searching and presenting the collection from the point of view of location.

However, I've only managed to get about a 70% success rate so far. Most of the rest of the places given in the JSON records don't provide enough information to unambiguously figure out the corresponding place in TGN. For example, lots of US places share the same name (Providence, Boston, Atlanta, etc.), distinguished by county, state, etc. but the JSON records only provide the country.

Is there any other data lurking anywhere that could help disambiguate these places?

I'd happily add the Getty URIs to the records as a PR.

Cheers, Alex.

richbs commented 9 years ago

Hi Alex,

Thanks very much for this and going to such a great effort! You are right in suggesting that our place names have their origin in the Getty TGN. Our Collections Management System is seeded with a rather dated version, which is used internally.

Ideally, we'd like to upgrade our data to a more recent version of the TGN but I would also like to see identifiers for what we have being exposed in our open data and web site, so that we can harness the power of the web and contributions like yours. I'll speak to our team about this.

In the meantime, we'd love to see what you've got!

Best regards, Rich

ajtucker commented 9 years ago

Hi Rich,

It looks as though TGN does a good job of keeping stable identifiers for places, so updated revisions should match up ok as long as one keeps track of the numeric ID of the place. Do you know if the TGN IDs are kept in your Collection Management System and if so, whether you'd be able to surface them in the JSON data?

Of the 30% or so place names I can't easily match up, most of them are ambiguities and the rest seem to be down to differences in place_type or differences in the literal string value (Misr vs. Miṣr, Éire vs. Ireland). These differences could well be down to changes in the TGN over time.

As for putting the links back into the JSON data, we're actually working on transforming the data into Linked Open Data where it's much easier to state these external relationships, especially now that Getty TGN is also Linked Open Data. Reading over http://www.tate.org.uk/context-comment/blogs/archives-access-project-open-data-brings-beauty-and-insight you mention this is of interest, so maybe this is a better way for us to contribute?

Cheers, Alex.

richbs commented 9 years ago

Dear Alex,

Yes, you're absolutely right we'd welcome LOD contributions to our open data programme. It's something we'd like to do so any "way in" would be a great catalyst. Obviously, we'd like to acknowledge any contributions so I'd be happy to merge in pull requests and give credit on the README.

Thanks again for your interest.

Best regards, Rich