periodo / periodo-data

Tracking PeriodO data quality issues
http://perio.do
The Unlicense
5 stars 0 forks source link

Periods missing spatial coverage (link to gazetteer) #34

Open rybesh opened 7 years ago

rybesh commented 7 years ago

The few I checked were from all ARIADNE. These appear to have (textual) spatial coverage descriptions, but no associated gazetteer entities. Full list attached.

Missing spatial coverage.xlsx

atomrab commented 7 years ago

These will all be entries (from ARIADNE, from LoC, and elsewhere) that use textual descriptions that we couldn't match to a gazetteer extent using the user interface for input (e.g. "Sicily", "Rome", "Italy less Sicily, Sardinia, Tuscany, Umbria", "Crete"). Some of these ("Sicily", "Crete", etc.) could, I imagine, be matched up to a Wikidata entry without difficulty. Others are a little tricker -- when LoC says "Rome--History--Titus, 79-81", do they mean a) the city of Rome now, b) the city of Rome in antiquity, or c) the extent of the Roman Empire in AD 79-81? I think it's probably c), but we don't have a historical gazetteer incorporated yet that can give us the Roman Empire as of AD 81 as a shapefile. Others are modern but will be more complicated, since "Italy less Sicily, Sardinia, Tuscany, Umbria" will presumably involve hand-selecting all Italian regions except those four (I wonder if we could have a pulldown with expanding arrows and checkboxes, so you could expand Italy, check all, and then clear out those four...).

On Thu, Jan 26, 2017 at 2:06 PM, Ryan Shaw notifications@github.com wrote:

The few I checked were from all ARIADNE. These appear to have (textual) spatial coverage descriptions, but no associated gazetteer entities. Full list attached.

Missing spatial coverage.xlsx https://github.com/periodo/periodo-data/files/733461/Missing.spatial.coverage.xlsx

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/periodo/periodo-data/issues/34, or mute the thread https://github.com/notifications/unsubscribe-auth/AEhXgwuZD2EbDO_4jNGIJJPngQghHumHks5rWPzDgaJpZM4LvFEb .

rybesh commented 4 years ago

As of July 10 2020 the following authorities have at least one period missing spatial coverage links to gazetteers:

http://n2t.net/ark:/99152/p04ff5k http://n2t.net/ark:/99152/p05hrsf http://n2t.net/ark:/99152/p06c6g3 http://n2t.net/ark:/99152/p06g4sd http://n2t.net/ark:/99152/p07rxtq http://n2t.net/ark:/99152/p088hzz http://n2t.net/ark:/99152/p08m57h http://n2t.net/ark:/99152/p0b6j5m http://n2t.net/ark:/99152/p0bd664 http://n2t.net/ark:/99152/p0c3bh8 http://n2t.net/ark:/99152/p0d59sh http://n2t.net/ark:/99152/p0dfxxp http://n2t.net/ark:/99152/p0dkm29 http://n2t.net/ark:/99152/p0ff3dt http://n2t.net/ark:/99152/p0fk6s4 http://n2t.net/ark:/99152/p0fs84x http://n2t.net/ark:/99152/p0g4k29 http://n2t.net/ark:/99152/p0hsq83 http://n2t.net/ark:/99152/p0jf288 http://n2t.net/ark:/99152/p0jgnvq http://n2t.net/ark:/99152/p0jk4xk http://n2t.net/ark:/99152/p0jrrjb http://n2t.net/ark:/99152/p0mwsd7 http://n2t.net/ark:/99152/p0nt759 http://n2t.net/ark:/99152/p0pf7xr http://n2t.net/ark:/99152/p0pgmrb http://n2t.net/ark:/99152/p0pv57g http://n2t.net/ark:/99152/p0qhb66 http://n2t.net/ark:/99152/p0s2rwk http://n2t.net/ark:/99152/p0s5mgk http://n2t.net/ark:/99152/p0xkgmr

atomrab commented 4 years ago

on it

rybesh commented 4 years ago

Keep in mind that we've been augmenting the gazetteers on an ad-hoc basis, so if you come across spatial coverage descriptions with no plausible corresponding gazetteer entries, let me know so I can look for Wikidata records to add to our gazetteers.

atomrab commented 4 years ago

Ah, I didn’t realize there was more out there. French and Greek cities (eg Paris and Athens) and French, Greek, and Romanian regions (eg Burgundy, Transylvania, and the Peloponnese) have jumped it at me so far. As I work through the LCSH, I’m going to hit a whole lot of major European cities in the WWII “bombardment” periods.

On Sat, Jul 11, 2020 at 8:28 AM Ryan Shaw notifications@github.com wrote:

Keep in mind that we've been augmenting the gazetteers on an ad-hoc basis, so if you come across spatial coverage descriptions with no plausible corresponding gazetteer entries, let me know so I can look for Wikidata records to add to our gazetteers.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/periodo/periodo-data/issues/34#issuecomment-657062250, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABEFPA6ESIVEEOQMCIPHCP3R3BSILANCNFSM4C54KENQ .

atomrab commented 4 years ago

Italian cities are missing too, but less consistently (there's Bologna but not Siena, for example).

atomrab commented 4 years ago

We could use some historical German regions, too: the Palatinate, Prussia, etc.

atomrab commented 4 years ago

Canadian provinces.

atomrab commented 4 years ago

I'm looking at my own Chersonesos periodization, which lacks gazetteer links because we don't have ancient cities, or Sevastopol, or Crimea. But Wikidata does have Chersonesos: https://www.wikidata.org/wiki/Q638445. And it occurs to me, as a shortcut: what if we did a lookup for Wikidata entries that had Pleiades IDs? We'd get a whole bunch of ancient and archaeological sites that way.

rybesh commented 4 years ago

For anything like Canadian provinces or French regions, or Pleiades / Wikidata, we can pretty easily do it. Things like "Italian cities" are tougher without some way of deciding which cities. All Italian cities is too many. We could do the biggest Italian cities, but those aren't necessarily the ones with periods defined in relation to them. For things like that, I think we should continue to add them as needed. But if you can keep going through and adding spatial coverage where we do have an appropriate place, then I can fairly easily look at the remaining ones and figure out how to add them (the current gazetteers are the result of looking at what was missing from the DBpedia set).

rybesh commented 3 years ago

As of June 9 2021 the following authorities have at least one period missing spatial coverage links to gazetteers:

http://n2t.net/ark:/99152/p05hrsf http://n2t.net/ark:/99152/p07rxtq http://n2t.net/ark:/99152/p08m57h http://n2t.net/ark:/99152/p09jqrd http://n2t.net/ark:/99152/p0b6j5m http://n2t.net/ark:/99152/p0bd664 http://n2t.net/ark:/99152/p0c3bh8 http://n2t.net/ark:/99152/p0d59sh http://n2t.net/ark:/99152/p0dfxxp http://n2t.net/ark:/99152/p0dkm29 http://n2t.net/ark:/99152/p0ff3dt http://n2t.net/ark:/99152/p0fk6s4 http://n2t.net/ark:/99152/p0fs84x http://n2t.net/ark:/99152/p0g4k29 http://n2t.net/ark:/99152/p0hsq83 http://n2t.net/ark:/99152/p0jf288 http://n2t.net/ark:/99152/p0jgnvq http://n2t.net/ark:/99152/p0jk4xk http://n2t.net/ark:/99152/p0jrrjb http://n2t.net/ark:/99152/p0mwsd7 http://n2t.net/ark:/99152/p0ndpcq http://n2t.net/ark:/99152/p0nt759 http://n2t.net/ark:/99152/p0pf7xr http://n2t.net/ark:/99152/p0pgmrb http://n2t.net/ark:/99152/p0pv57g http://n2t.net/ark:/99152/p0qhb66 http://n2t.net/ark:/99152/p0s2rwk http://n2t.net/ark:/99152/p0s5mgk http://n2t.net/ark:/99152/p0xkgmr

rybesh commented 3 years ago

Here's a spreadsheet with the complete list of periods currently missing gazetteer links:

https://docs.google.com/spreadsheets/d/1qo522zQ_TBhkkqtvrjNuJ_C5jS4cFydX16lk23LReDQ/edit?usp=sharing

atomrab commented 3 years ago

Do we have any sense of how to deal with idiosyncratic exclusive expressions of spatial coverage, like "Italy not including Sicily"? Do we attach the more inclusive "Italy", or do we not link? I guess the question is whether we prefer false positives or false negatives.

ylan1 commented 3 years ago

Yes, I wondered about that, too. For "Italy less Sicily," I was thinking of including all 19 first-order administrative divisions of present-day Italy, other than Sicily, i.e., sibling classes of Sicily; all already in the existing PeriodO gazetteers.

Abruzzo (Q1284) Aosta Valley (Q1222) Apulia (Q1447) Basilicata (Q1452) Calabria (Q1458) Campania (Q1438) Emilia-Romagna (Q1263) Friuli-Venezia Giulia (Q1250) Lazio (Q1282) Liguria (Q1256) Lombardy (Q1210) Molise (Q1443) Piedmont (Q1216) Sardinia (Q1462) Marche (Q1279) Trentino-South Tyrol (Q1237) Tuscany (Q1273) Umbria (Q1280) Veneto (Q1243)

My impression from Ryan's email sent to me on Friday (11 June) is that you only need a rough approximate of the geographical scope since there is "no such thing as "'pure' spatial entities vs. 'pure periods.'"

Whichever approach (link to Italy (Q38) instead, or not to link) and the level of granularity you want me to implement is up to you and Ryan to decide. Let me know what you think.

ylan1 commented 3 years ago

I updated the spreadsheet yesterday. Let me know if I need to make further changes. Thanks!

atomrab commented 3 years ago

@ylan1, I meant to weigh in on this -- for spatial coverage statements that explicitly don't include a particular region, I think it's preferable to follow the approach here and list all the regions that are covered, leaving out the ones that aren't (the same thing applies to Greece not including Crete, which I think we also have).

Ryan is correct that rough approximations are fine, but the problem we get into in this situation is that the project in question also provided separate definitions for Sicily, and if we identify "Italy but not Sicily" as Q38 (Italy including Sicily) and Sicily as Sicily, we get two sets of periods from the same project for Sicily, which might create confusion down the road. So if a project is specific about regional distinctions, I think we should try to reflect those in the granularity of the gazetteer entities we include.