skybristol / geokb

Data processing workflows for initializing and building the Geoscience Knowledgebase
The Unlicense
3 stars 3 forks source link

Build USGS personnel profile inventory harvester #8

Closed skybristol closed 11 months ago

skybristol commented 11 months ago

This item represents the USGS staff profiles as a source for the GeoKB:

https://geokb.wikibase.cloud/wiki/Item:Q44323

I need to port some previous work that ran a web scraping routine to pull all the personnel profiles. Since this content has to go somewhere and it changes regularly, I'm going to experiment with caching it as YAML on the source item's discussion page from which another operation can pick it up for processing.

skybristol commented 11 months ago

I built this as a notebook and rearranged this part of the codebase into a harvesters folder where I'll start practicing a little more discipline. This is a pretty straightforward algorithm that could be run as a microservice on some kind of scheduled basis to simply pull the inventory of the important part of the URL path for every profile listed at a particular time.