Open skybristol opened 1 year ago
I've started working through the contents of the GSO minerals module and have come up with a few questions we need to answer before I proceed with pulling this into the GeoKB representation. I've attached a simplified Excel table with the major elements from the GSO minerals schema that we might be able to make use of. My questions are based on this.
@jrosera - Thanks for the comments. That's perfect!
I'll work in a process to essentially consult MRDS to subset the full list of "mineral materials" from the GSO. I think that seems reasonable, and we can always come back and pull in more entities as use cases expand beyond economic geology.
Thanks for providing feedback on things to ignore. There is no need to pull in additional information beyond what we will actually use as we're not trying to set up the comprehensive resource here.
For the structural group part of this, I think I'll start with the existing part of/has part properties we already have. These are designed to produce the reciprocal relationship you're talking about. There are different schools of thought in the semantics world on using a bunch of different specific properties vs. more general relationships. I go back and forth, and we can always rework the relationships at a later time if needed.
I had worked out the basics of a method to pull in the full Strunz classification system from source material. I'll look at that again but may end up simplifying that for our immediate purposes and use only the part of the classification referenced in the GSO.
@skybristol
I cannot speak too much to ongoing semantics debates, but I would just recommend that you keep in mind that very specific mineral names often require lab methods that are not necessarily part of routine economic geology surveys - especially the older reports that more or less list out mineral phases that were observed in mapping / drilling / thin section etc. While there is great information baked into specific, end-member mineral names based on full geochemical characterization, much of what we work with is at the slightly more general level (e.g., structural group).
My guess is that if you pull all of the unique MRDS materials flagged as ore or gangue you will have a mix of names from specific to structural group.
@jrosera - That's the approach I'm taking this morning. Looking at that list of 743 unique names found in ore or gangue, we do not have complete alignment with either the Geoscience Ontology or Mindat.
The bottom line is that mineralogy is complex just like biological taxonomy that's closer to my own domain. Every scientist that approaches this problem of classification and identification is going to come at it a little bit differently, and the whole system is in flux at any given point with no perfect way of capturing the complexity of the state of scientific knowledge in the simplicity of linked open data and explicit semantics. As you say, at some point in time, some things that show up in an attempt to classify everything are going to be in the process of being studied and better characterized.
Our overall goal with the GeoKB is to have named and identified entities from all the subject matter we study and encounter in our specific scientific portfolio. This is conceptually similar to what Peter Schweitzer has done with much of the USGS Thesaurus work, but we are taking that further into other domains and focusing significant attention on linking "our concepts" with linkable entities from other knowledge representations in order to do the following:
We need to finish building out the minerals reference in the GeoKB. This was started with a combination of MRData and Mindat reference materials, where I attempted to develop a new type of conceptual mapping to named entities that can be classed as minerals, commodities, or even chemical elements. While we may need to revisit the notion of single element minerals, the mineral commodity approach seems like it should work.
This work will include bringing in a full representation of the GSO minerals module processed through software code, factoring in where I have already instantiated some items from other sources.