Set up Fuseki Server (Jing's NERSC SPIN training capstone)

PeopleMakeCulture commented 8 months ago

The goal of this ticket is two-fold:

Stand up a Fuseki server on SPIN to host an instance of a graph db that the graph search API can query against. Details in #401
Give @PeopleMakeCulture an opportunity to set-up a new service on SPIN as the capstone project for self-directed SPIN training

PeopleMakeCulture commented 8 months ago

From Cory @NERSC:

We do like for you to build the example application from the exercises in the Rancher "spinup" project as a starting point, because it incorporates a lot of the features you will likely use (storage, secrets, config maps, ingresses, ports / cluster IPs) but also shows some of the unique aspects of Spin around storage types available, security requirements, etc. It also serves as a sort of homework assignment that we can "grade". :D

So, please start with that, and let us know when you're done.

Looks like I will be setting up a service in the spinup project first!

PeopleMakeCulture commented 8 months ago

NOTE: See documentation of existing NMDC graph DBs here: https://github.com/microbiomedata/issues/issues/638

turbomam commented 8 months ago

I would like to build upon https://github.com/microbiomedata/issues/issues/638 and think about the isolation of knowledge in the NMDC SPIN Fuseki, as well as the ability to integrate with resources from other linked data sets.

Is this an OK place to do that?

I see one dataset in https://fuseki.polyneme.xyz : nmdc. That's one level of isolation.

I don't believe named graphs are being used in https://fuseki.polyneme.xyz at this time

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT
distinct ?g
WHERE {
  graph ?g {
    ?sub ?pred ?obj .
  }
}

I hope any properties in this database that do not come directly from the LInkML language, the nmdc-schema or the nmdc-ontology

will be defined
use an isolated prefix/URI base, not nmdc:. I see you, https://w3id.org/nmdc/depends_on
will be asserted in a separate named graph

In the NMDC AWS GraphDB, the data and the nmdc-schema both use what @cmungall calls "non-native URIs" like https://w3id.org/mixs/0000012, as opposed to https://w3id.org/nmdc/env_broad_scale. I would like for us to think through the consequences of using schema-native URIs as the Fuseki database does.

PeopleMakeCulture commented 7 months ago

Feature requirements for production-ready graph database

RDF-Gen Alignment

Mark's process is documented here

Donny's process can be viewed here

Include named graph `nmdc:nmdc`for schema representation

see: http://35.173.42.85/graphs

Standardize type representations

replace nmdc:type with rdf:type for predicates in data store
use uri or CURIE, not strings, for values of rdf:type
for predicates, pick either id-based curie (eg MIXS:12) or textual curie (eg broad scale env context)
if we use textual Curies, we will need some way to integrate with other ontologies (via the numeric id curie)
triples associated with types should be included (eg http://35.173.42.85/resource?uri=https:%2F%2Fw3id.org%2Fmixs%2F0000012&role=subject)

Approach

Aliasing - Mongo changesheets might use a textual curie (eg "lat-lon" from an external vocabulary; but we would convert that to the primary key for that term in the external vocabulary
Stricter enforcement for changesheets
Should we have our own namespace of terms?

turbomam commented 2 months ago

@PeopleMakeCulture

I was really excited when we were working on things like this together, but maybe this issue can be closed now due to lack of activity?

microbiomedata / nmdc-runtime