microbiomedata / nmdc-runtime

Runtime system for NMDC data management and orchestration
https://microbiomedata.github.io/nmdc-runtime/
Other
7 stars 3 forks source link

Set up Fuseki Server (Jing's NERSC SPIN training capstone) #496

Open PeopleMakeCulture opened 8 months ago

PeopleMakeCulture commented 8 months ago

The goal of this ticket is two-fold:

  1. Stand up a Fuseki server on SPIN to host an instance of a graph db that the graph search API can query against. Details in #401

  2. Give @PeopleMakeCulture an opportunity to set-up a new service on SPIN as the capstone project for self-directed SPIN training

PeopleMakeCulture commented 8 months ago

From Cory @NERSC:

We do like for you to build the example application from the exercises in the Rancher "spinup" project as a starting point, because it incorporates a lot of the features you will likely use (storage, secrets, config maps, ingresses, ports / cluster IPs) but also shows some of the unique aspects of Spin around storage types available, security requirements, etc. It also serves as a sort of homework assignment that we can "grade". :D

So, please start with that, and let us know when you're done.

Looks like I will be setting up a service in the spinup project first!

PeopleMakeCulture commented 8 months ago

NOTE: See documentation of existing NMDC graph DBs here: https://github.com/microbiomedata/issues/issues/638

turbomam commented 8 months ago

I would like to build upon https://github.com/microbiomedata/issues/issues/638 and think about the isolation of knowledge in the NMDC SPIN Fuseki, as well as the ability to integrate with resources from other linked data sets.

Is this an OK place to do that?

I see one dataset in https://fuseki.polyneme.xyz : nmdc. That's one level of isolation.

I don't believe named graphs are being used in https://fuseki.polyneme.xyz at this time

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT
distinct ?g
WHERE {
  graph ?g {
    ?sub ?pred ?obj .
  }
}

I hope any properties in this database that do not come directly from the LInkML language, the nmdc-schema or the nmdc-ontology

In the NMDC AWS GraphDB, the data and the nmdc-schema both use what @cmungall calls "non-native URIs" like https://w3id.org/mixs/0000012, as opposed to https://w3id.org/nmdc/env_broad_scale. I would like for us to think through the consequences of using schema-native URIs as the Fuseki database does.

PeopleMakeCulture commented 7 months ago

Feature requirements for production-ready graph database

RDF-Gen Alignment

Mark's process is documented here

Donny's process can be viewed here

Include named graph nmdc:nmdcfor schema representation

Standardize type representations

Approach

  1. Aliasing - Mongo changesheets might use a textual curie (eg "lat-lon" from an external vocabulary; but we would convert that to the primary key for that term in the external vocabulary
  2. Stricter enforcement for changesheets
  3. Should we have our own namespace of terms?
turbomam commented 2 months ago

@PeopleMakeCulture

I was really excited when we were working on things like this together, but maybe this issue can be closed now due to lack of activity?