whosonfirst / whosonfirst-dates

This is where we will think about time for Who's On First documents. Which is hard. Because it can not be denied...
3 stars 2 forks source link

RFI: who cares about time? #1

Open gerwitz opened 8 years ago

gerwitz commented 8 years ago

I am desperately seeking a gazette that includes historic place names for use in genealogy tools. WOF seems perfect, but I don't sense any other parties are interested in maintaining obsolete places.

Is anyone else out there worried about e.g. extending Pelias to include date range queries and return superseded results?

thisisaaronland commented 8 years ago

There are a bunch of us here (at Mapzen) who care a whole lot about historical places and Who's On First has been designed in a way to encourage and allow them. Or failing that, to at least force the issue.

That said, there are only so many hours in the day and just managing the "present" is a lot of work. The historical stuff doesn't always get the time (sorry) or attention that it deserves so the goal is to design things in such a way that we can revisit them later or, ideally, so that history "just happens". Specifically:

Pelias itself does not have native support for either of these things because 1) they too are busy managing the present so tracking things that have been superseded just isn't the priority and 2) the, hopefully short-term, cost of EDTF is that there aren't (m)any good parsers for converting complex EDTF notation in to something that a database can query.

That said, Pelias is an open source project and Mike Migurski has done a really great job of making it easy to get set up and installed on a plain-vanilla Linux machine:

http://mike.teczno.com/notes/openaddr/5min-geocoder.html

From there it should be possible to write an "historical" Pelias importer or just shove historical data into Elasticsearch manually and see what comes out the other end. I suspect that there are a few technical gotchas for working with historical data but the really hard bits will be around the UI and UX considerations around displaying the results and we would love to hear about people's experience teaching Pelias new tricks.

As for EDTF I started working on a simplified version/port of the CIDOC-CRM temporal libraries for converting dates in to integers. I haven't been able to do anything with it in a while and it is incomplete and possibly full of bugs but you can see the work to date here:

https://github.com/whosonfirst/go-whosonfirst-temporal

gerwitz commented 8 years ago

Thanks for the thoughtful response! Allow me to ramble a bit in the hope you have more thoughts. ;-)

I "simply" want to standardize location fields on WikiTree, with hierarchy and time context.

Because the tech stack is so different and the project is volunteer-run, I don't think self-hosted Pelias will be an option. Also, because centroids are enough for visualization purposes and no gazetteer is rich in historic places yet, the appeal of WOF it more philosophical than technical.

My dream is to harness the active, detail-oriented community of WikiTree to flesh out the WOF data set, temporally.

So I may only use MapZen Search for geocoding, track the WOF IDs for future use, and ask users to manually enter at least dates, sources, and centroids for any added places. If the collection grows as well as I hope it will, perhaps it can reciprocate as a data source to WOF. I doubt, though, that we'll be able to collect shapes or even trust the centroids.

gerwitz commented 8 years ago

I should also note here that the only other large-scale historic gazetteer I've encountered is GOV, a German "Ortsdatenbank" of political entities. I'm not yet clear on their licensing, but they don't yet seem to be a source and might enable bootstrapping.

(Their date formats allow for some ambiguity, but nothing as nuanced as EDTF.)

thisisaaronland commented 8 years ago

If you have historical places and can map those locations to their current WOF ID then that would be an interesting dataset to investigate and test what will essentially be a linked-list of places.