skybristol / geokb

Data processing workflows for initializing and building the Geoscience Knowledgebase
The Unlicense
3 stars 3 forks source link

Work out code list exports for LIMS (and others) #32

Closed skybristol closed 3 months ago

skybristol commented 10 months ago

One use case being explored is how the GeoKB can be used as a direct source for things like code lists needed in various USGS data/information systems, with the initial use case being the upgraded Laboratory Information Management System (LIMS) for the Analytical Geochemistry Lab. Within the sample submission/management aspect of that platform are a number of fields for submitters to choose things like rock type, geologic age, and other parameters. These end up being important metadata about samples that should trace to an established source of definition for values. That definition should incorporate details about the original sources used and linkages to additional information that can be useful as data progress through a logical processing pipeline (managed through our USGS Quality Management System) to eventual release. The GeoKB is a place to gather this information, much of which will generally be sourced from other established and published material and often from non-USGS sources.

This issue starts the process of figuring out the best way to accomplish this use case through some initial focus on rock types as something we're already dealing with in the GeoKB and one of the larger sets of input values for the LIMS. The main questions to pursue:

  1. Have we effectively sourced and organized what is necessary to accomplish LIMS needs within the rock material type classification being worked through #27?
  2. Do we need any additional attributes recorded in the model to support filtering the larger set of rock types to what the LIMS needs?
  3. Have we articulated the query language necessary to pull the values the LIMS needs to operate with this source?
  4. Is there a technical method for the LIMS to dynamically get its information from the GeoKB, or do we need steps documented in a protocol for humans to follow in querying for rock types from the GeoKB and importing into the LIMS configuration?
skybristol commented 10 months ago

We should ultimately look at all of the code list/pick list concepts that the LIMS system uses and provide clarity on where these are sourced. As we look toward the future data architecture for the next-generation National Geochemical Database (and any data flowing through a process such as that one), we want to be able to link to other details about entities represented in those data with complete confidence. The basic concept of namespaces can apply here whether it is actually encoded into the data or not. If we source the table (or whatever that configuration principal looks like) in a given system like the LIMS from a particular "module" in the GeoKB, even for mundane things like a listing of world countries or U.S. States, then when the data come out, we know that we can effectively tie a name identifier (e.g., Sweden) to a corresponding knowledgebase item (e.g., Kingdom of Sweden, aka Sweden). From there, we can leverage whatever other linkages or attributes we develop in concert with the data that value was connected with.

In looking over the source tables form the WBSSS move to LIMS, I'm taking the opportunity to review and refresh previous work on things like countries of the world. In doing so, I'm also taking a look at documentation so that we have logical SPARQL queries teed up that can be used to pull values for LIMS configuration or whatever other purposes. We will need to get into certain details like the fact that current usage in practice uses "Afghanistan" vs. "Islamic Republic of Afghanistan" (current official name). We reflect these dynamics in practice using aliases, but that doesn't equate to a direct query path and may need to be revisited in the GeoKB architecture.