skybristol / geokb

Data processing workflows for initializing and building the Geoscience Knowledgebase
The Unlicense
3 stars 3 forks source link

Incorporate all minerals from the GSO source #42

Open skybristol opened 9 months ago

skybristol commented 9 months ago

This process ended up being fairly messy, and so the code may be a little hard to follow or work through. I tried to capture at least some notes through the pipeline as to major decisions made. The important part is to take a look at the results in the GeoKB and make sure we've not made any egregious errors or introduced anything too weird.

One of the most challenging parts was dealing with the Nickel-Strunz classification, which we decided we wanted to have in play at some level. Strictly speaking, this is a classification scheme, though it is one that defies some of the conventions for OWL classification. The GSO dealt with it by simply including the lowest level code (which includes higher level codes), making up a label from parts of the description, and incorporating it as a property. Mindat includes the Strunz classification but has purportedly made a few tweaks here and there, which are not understood (to me). I opted to find the intersection between the two and incoporated it under "mineral material" as a classification scheme. I also included an "unclassifified mineral material" area to gather everything else.

skybristol commented 9 months ago

Along with the documentation in the code, worked up a more thorough discussion on decisions made in representing mineral materials from the two sources (Mindat and the Geoscience Ontology). This is in the discussion page for the upper level of the classification - mineral material. (Note: I'll be reworking all of the other entity types in this way, with a short description and pointer from the main wiki page in the Wikibase instance to a more robust description at the top level of the class hierarchy in the GeoKB ontology.)

The mineral material page and the queries laid out there should form the substance of reviewing this pull request. The queries should expose everything that resulted from the work documented and executed in the two code notebooks that were committed as part of this pull request. If I've made any mistakes they should be readily exposed by taking a look through the items this work introduced.

skybristol commented 9 months ago

The following set of named entities from MRDS could not be immediately connected to anything we can readily source from the Mindat/GSO source material. We'll have to determine where these concepts fit and how to deal with them (if at all). Some are other types of geologic material that we might organize under rock materials or some other part of the GeoKB ontology.

Cinders Telluride Manganese Ox-Hydrous Titaniferous Magnetitite Sand and Gravel Brine Polycrase Stetefeldite Iron Oxides-Hydrated Amalgam Tripoli Geodes Potash Oyster Shells Schorlite Hausmanite Carnalite Thalenite Lapis-Lazuli Vanadiferous Magnetite Mineral Paint Fullers Earth Tyunjamunite Mineral Pigments Yttrialite Vordisite Fersmannite Verde Antique

skybristol commented 9 months ago

I added a couple of network visualizations to the notebooks with this pull request to help explain how I went about classifying minerals. The graph visualization in the query service (linked to from the Try it links where I've documented SPARQL syntax) can also be used in this regard.