whosonfirst / whosonfirst-placetypes

Where things are (and what they mean) in Who's On First.
Other
28 stars 13 forks source link

Add hierarchical placetype "rank" property #12

Open thisisaaronland opened 6 years ago

thisisaaronland commented 6 years ago

A numeric property that can be to sort place(type)s, typically in a database setting.

missinglink commented 5 years ago

I had a crack at this today, it's a fairly simple algorithm:

  // assign ranks
  while (!placetypes.every(pt => pt.hasOwnProperty('rank'))) {
    placetypes.forEach(pt => {
      // already ranked
      if (pt.hasOwnProperty('rank')) { return }

      // pt has 0 parents
      if (!pt.parent.length) {
        pt.rank = 0
        return
      }

      // not all parents have been ranked yet
      if (!pt.parent.every(pid => ids.get(pid).hasOwnProperty('rank'))) { return }

      // find the highest parent rank
      let parentRanks = pt.parent.map(pid => ids.get(pid).rank)
      let maxParentRank = Math.max.apply(null, parentRanks)

      // assign rank one higher than highest parent
      pt.rank = maxParentRank + 1
    })
  }
missinglink commented 5 years ago

Although with this ranking algo there are some duplicate ranks:

rank 0: [ 'planet', 'metroarea', 'constituency' ]
rank 1: [ 'continent', 'ocean' ]
rank 3: [ 'country', 'dependency' ]
rank 4: [ 'disputed', 'timezone', 'marinearea', 'marketarea' ]
rank 11: [ 'borough', 'postalcode' ]
missinglink commented 5 years ago

I was looking at this again today and I think that while the algorithm isn't perfect, it might highlight some errors in the existing spec:

  1. metroarea should possibly have a list of parent placetypes to restrict how it is used in the hierarchy model.
  2. same for constituency, this doesn't make sense for things above a sovereign nation or political union. outer space maybe? :P
  3. timezone is a very weird thing to rank
  4. disputed is also a weird thing to rank

It also illustrates some interesting things: A. continent and ocean can exist at the same rank since they are mutually exclusive B. same for country and dependency C. same for marinearea and marketarea

Here's the full list using this algo:

00   metroarea           [optional]
00   constituency        [common_optional]
00   planet              [common_optional]
01   ocean               [common_optional]
01   continent           [common]
02   empire              [common_optional]
03   country             [common]
03   dependency          [common_optional]
04   marketarea          [optional]
04   marinearea          [common_optional]
04   timezone            [common_optional]
04   disputed            [common_optional]
05   macroregion         [optional]
06   region              [common]
07   macrocounty         [optional]
08   county              [common_optional]
09   localadmin          [common_optional]
10   locality            [common]
11   postalcode          [common_optional]
11   borough             [common_optional]
12   macrohood           [optional]
13   neighbourhood       [common]
14   microhood           [optional]
15   campus              [common_optional]
16   intersection        [optional]
17   address             [common_optional]
18   building            [common_optional]
19   wing                [optional]
20   concourse           [optional]
21   arcade              [optional]
22   venue               [common_optional]
23   enclosure           [optional]
24   installation        [optional]

cc @thisisaaronland @nvkelso do you think points 1. and 2. are errors in the spec or is that by design?

thisisaaronland commented 5 years ago

I think on a practical level it probably makes sense to assume metro areas are parented by [region,country] and constituency by [country,empire]. Think, EU membership and the like. I'd like to think about it for a day or so I am inclined to agree with you.

On a more think-y think-y level in these examples you are seeing the messy intersection of the liberal-economic nation state and late-capitalist globalized capitalism...

Also, I'm not sure if the numbering scheme above is just for example but we should make sure the rank have a > 1 separator to allow for flexibility in the future.