lcnp / GSIP

Groundwater Surface water Interop Experiment
MIT License
1 stars 0 forks source link

Get concretizedBy dataset in addition to subjectOf #12

Closed denevers closed 2 weeks ago

denevers commented 1 month ago

The goal of this update is to support a change in the ontology on how a /id/ is linked to data representations. The new ontology states that. There were significant changes in the ontology that the description in this ticket are nearly useless.

see docs/infoset_ontology.adoc

A real thing can be the subject of a small dataset , and this dataset can be concretizes by a larger Infoset.

<id/MineralOccurrence/BC/082ENE012> schema:subjectOf [
             cgdn:concretizes <info/Infoset/BC/MineralOccurrence> 
]

An Infoset can also be concretized by a dataset

<info/Infoset/BC/MineralOccurrence>  cgdn:concretizedBy [
        cgdn:concretizes <info/Infoset/BC/MineralOccurrence> ;
]

It provides a structure to represent a thing (/id/) by a small dataset about this thing, and related this small dataset to a larger collection (an Infoset). But the small dataset and the Infoset can be represented in multiple formats. This information is rendered in HTML called a Landing Page.

The Landing page is build using FreeMarker. FreeMarker engine creates the page on the server side, so the engine can speak to java classes, speaking to a RDF model loaded in memory. The challenge is to pull all the data from the triple store needed to populate the page.

Freemaker engine is passed a class called "ModelWrapper" (nrcan.lms.gsc.gsip.model.ModelWrapper) that provide a FreeMarker friendly API over a Model Jena model (org.apache.jena.rdf.model.Model)

The template generating the page is infohtml.ftl' intemplates` folder. Freemarker essentially reads any text file and substitute specially marked section - pretty much like PHP.

Variable substitutions are marked using ${some_variable} while more complex bits, like expressions are in <#statement> blocks

So a typical mixture of HTML and Freemarker looks like this:

<#list grp?keys as p>
    <li><strong>${p}:</strong>
        <#list grp[p] as link>
             <a
                href="${link.getUrl()}?lang=${locale}"
                title="${link.getUrl()}">${link.getResLabel()}</a>
        </#list>
    </li>
</#list>

data is passed to freemarker engine as a hashtable (or a "dictionary"). Data can be literal values or class instance which can be invoked. An in this case, one important variable that is passed to the engine is model which is of type nrcan.lms.gsc.gsip.model.ModelWrapper.

So if you find

${model.getLocText("xxx","yyy")} in the templace, it actually invoked

/**
     * helper function to return a string in the preferred language
     * @param en
     * @param fr
     * @return
     */
    public String getLocText(String en,String fr)
    {
        return locale.equals("fr")?fr:en;

    }

in nrcan.lms.gsc.gsip.model.ModelWrapper.

To add the extra functionality, we will alter one function of ModelWrapper that pulls datasets for a /id/:

/**
     * Get a list of representations (subjectOf resources) for this resource
     * @param res data resource, can be a blank node
     * @return
     */
    private List<Resource> getRepresentations(Resource res)
    {
        //Logger.getAnonymousLogger().log(Level.INFO,"subjectOf "+ res.getURI());
        StmtIterator statements = res.listProperties(SCHEMA.subjectOf);
        List<Resource> subjectOf = new ArrayList<Resource>();
        while(statements.hasNext())
        {
            Statement s = statements.next();
            subjectOf.add(s.getResource());
        }
        return subjectOf;
    }
denevers commented 1 month ago

the predicate subjectOf provides a direct link from a a /id/ to a dataset

(Note, this has been changed to reflect new ontology that was revisited)

<id/MineralOccurrence/BC/082ENE012>    schema:subjectOf [].

The blank node is a dataset. However, we can only assume it is. The ontology does not say nor can infer that.

<id/MineralOccurrence/BC/082ENE012> 
    schema:subjectOf [
         cgdn:concretizes <info/Infoset/BC/MineralOccurrence> ;  # new
        dct:format "text/html" ;
        rdfs:label "html: ELLSWORTH, BEV, TUFF, MAL, JOHN, MOSH", "BC Mineral Occurrence 082ENE012" @en, "C-B Indice minéralisé 082ENE012" @fr ;
        schema:geo [
            a schema:GeoCoordinates ;
            schema:Latitude 49.5075 ;
            schema:Longitude -118.986111
        ] ;
        schema:name "html: ELLSWORTH, BEV, TUFF, MAL, JOHN, MOSH", "BC Mineral Occurrence 082ENE012" @en, "C-B Indice minéralisé 082ENE012" @fr ;
        schema:provider <https://www2.gov.bc.ca/gov/content?id=279686BC782F47ECA7B257376391D210> ;
        schema:url "http://minfile.gov.bc.ca/Summary.aspx?minfilno=082ENE012"
    ],

The link to the Infoset is in the blank node. The name is not persistent - it's just a name that is validinside the file. Not in the triple store.

[
  cgdn:concretizes <info/Infoset/BC/MineralOccurrence> ;
dct:format "application/gml+xml;subtype=erml" 
]

can also be represented as

_:b1   cgdn:concretizes <info/Infoset/BC/MineralOccurrence> ;
      dct:format "application/gml+xml;subtype=erml" ;

_:something is the convention to name a blank node.

The path we need to walk is

/id/ -> subjectOf -> blank node(dataset) -> concretizes ->/info/`

in ttl

<id/MineralOccurrence/BC/082ENE012> schema:subjectOf _:b1.
_:b1  cgdn:concretizes <info/Infoset/BC/MineralOccurrence> ;

we want those <info/Infoset/BC/MineralOccurrence>

denevers commented 1 month ago

This brings us to the content of the Model freemarker has access to. It's not the whole model, it's an optimistic "hope I got everything" SPARQL query that is stored templates that is called describe.ftl

Note, this is a freemarker template. ${resource?replace(' ','%20')} will be replaced with a resource URI (and replace spaces ith %20)

#describes a non info
PREFIX rdfs:  <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dct: <http://purl.org/dc/terms/>
PREFIX schema: <https://schema.org/>
PREFIX cgdn: <https://geoconnex.ca/id/onto/> 
CONSTRUCT {
    <${resource?replace(' ','%20')}> ?p ?o. # all resources from any property
    ?o ?p2 ?o2. # all predicate-of an any object selected above
    <${resource?replace(' ','%20')}> ?p3 ?l. 
    ?o2 rdfs:label ?l2.
    ?o schema:geo ?g.
    ?g ?pg ?pp.
    }
WHERE {<${resource?replace(' ','%20')}> ?p ?o. ?o ?p2 ?o2. <${resource?replace(' ','%20')}> ?p3 ?l. 
 OPTIONAL {?o2 rdfs:label ?l2.}. 
 OPTIONAL {?o schema:geo ?g. ?g ?pg ?pp}. 
 FILTER (isLiteral(?l))}
denevers commented 1 month ago

let's look what this gives us by using the ttl output

http://localhost:8080/gsip/info/MineralOccurrence/BC/082ENE001?f=ttl

@prefix cgdn:   <http://localhost:8080/gsip/id/onto/> .
@prefix dc:     <http://purl.org/dc/elements/1.1/> .
@prefix dct:    <http://purl.org/dc/terms/> .
@prefix owl:    <http://www.w3.org/2002/07/owl#> .
@prefix rdf:    <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs:   <http://www.w3.org/2000/01/rdf-schema#> .
@prefix schema: <https://schema.org/> .
@prefix xsd:    <http://www.w3.org/2001/XMLSchema#> .

owl:Thing  rdf:type          rdfs:Resource , rdfs:Class , owl:Class;
        rdfs:subClassOf      rdfs:Resource , owl:Thing;
        owl:equivalentClass  owl:Thing .

<https://www2.gov.bc.ca/gov/content?id=279686BC782F47ECA7B257376391D210>
        rdfs:label  "Commission géologique de la Colombie-Britannique"@fr , "British Columbia Geological Survey"@en .

cgdn:MineralOccurrence
        rdf:type             rdfs:Resource , rdfs:Class , owl:Class;
        rdfs:comment         "CGDN Mineral Occurrence Class. Equivalent to ERML definition of mineral occurrence."@en , "Classe RDGC d'un indice minéralisé. Équivalent à la définition d'un indice minéralisé de ERML."@fr;
        rdfs:label           "Indice minéralisé"@fr , "Mineral Occurrence"@en;
        rdfs:subClassOf      rdfs:Resource , owl:Thing , cgdn:MineralOccurrence;
        cgdn:searchable      "true";
        owl:equivalentClass  cgdn:MineralOccurrence .

<http://localhost:8080/gsip/info/Infoset/BC/MineralOccurrence>
        rdfs:label  "CGDN BC mineral occurrence dataset"@en , "RDGC jeu de données d'indices minéralisés C-B"@fr .

_:b0    rdf:type          rdfs:Resource , owl:Thing;
        rdfs:label        "png: MCKINLEY, MCKINLEY (L.140S), FRANKLIN CAMP" , "C-B Indice minéralisé 082ENE001"@fr , "BC Mineral Occurrence 082ENE001"@en;
        cgdn:concretizes  <http://localhost:8080/gsip/info/Infoset/BC/MineralOccurrence>;
        dct:format        "image/png";
        owl:sameAs        _:b0;
        schema:geo        [ rdf:type          rdfs:Resource , schema:GeoCoordinates;
                            schema:Latitude   49.540833;
                            schema:Longitude  -118.3875
                          ];
        schema:name       "BC Mineral Occurrence 082ENE001"@en , "C-B Indice minéralisé 082ENE001"@fr , "png: MCKINLEY, MCKINLEY (L.140S), FRANKLIN CAMP";
        schema:provider   <https://www2.gov.bc.ca/gov/content?id=279686BC782F47ECA7B257376391D210>;
        schema:url        "http://apps.empr.gov.bc.ca/geoserver/cgi/ows?request=GetMap&service=WMS&version=1.1.1&LAYERS=cgi%3ACA-BC-BCGS-mineral-occurrences&STYLES=&FORMAT=image/png&BGCOLOR=0xFFFFFF&TRANSPARENT=TRUE&SRS=EPSG:4326&BBOX=-118.8875,49.040833,-117.8875,50.040833&WIDTH=400&HEIGHT=300&query_layers=cgi%3ACA-BC-BCGS-mineral-occurrences&maxFeatures=10&CQL_FILTER=identifier='082ENE001'" .

_:b1    rdf:type          owl:Thing , rdfs:Resource;
        rdfs:label        "BC Mineral Occurrence 082ENE001"@en , "html: MCKINLEY, MCKINLEY (L.140S), FRANKLIN CAMP" , "C-B Indice minéralisé 082ENE001"@fr;
        cgdn:concretizes  <http://localhost:8080/gsip/info/Infoset/BC/MineralOccurrence>;
        dct:format        "text/html";
        owl:sameAs        _:b1;
        schema:geo        [ rdf:type          rdfs:Resource , schema:GeoCoordinates;
                            schema:Latitude   49.540833;
                            schema:Longitude  -118.3875
                          ];
        schema:name       "BC Mineral Occurrence 082ENE001"@en , "C-B Indice minéralisé 082ENE001"@fr , "html: MCKINLEY, MCKINLEY (L.140S), FRANKLIN CAMP";
        schema:provider   <https://www2.gov.bc.ca/gov/content?id=279686BC782F47ECA7B257376391D210>;
        schema:url        "http://minfile.gov.bc.ca/Summary.aspx?minfilno=082ENE001" .

_:b2    rdf:type          rdfs:Resource , owl:Thing;
        rdfs:label        "C-B Indice minéralisé 082ENE001"@fr , "erml: MCKINLEY, MCKINLEY (L.140S), FRANKLIN CAMP" , "BC Mineral Occurrence 082ENE001"@en;
        cgdn:concretizes  <http://localhost:8080/gsip/info/Infoset/BC/MineralOccurrence>;
        dct:format        "application/gml+xml;subtype=erml";
        owl:sameAs        _:b2;
        schema:geo        [ rdf:type          rdfs:Resource , schema:GeoCoordinates;
                            schema:Latitude   49.540833;
                            schema:Longitude  -118.3875
                          ];
        schema:name       "BC Mineral Occurrence 082ENE001"@en , "erml: MCKINLEY, MCKINLEY (L.140S), FRANKLIN CAMP" , "C-B Indice minéralisé 082ENE001"@fr;
        schema:provider   <https://www2.gov.bc.ca/gov/content?id=279686BC782F47ECA7B257376391D210>;
        schema:url        "http://apps.empr.gov.bc.ca/geoserver/cgi/ows?service=WFS&version=1.0.0&request=GetFeature&typeName=cgi%3ACA-BC-BCGS-mineral-occurrences&outputFormat=gml32&maxFeatures=10&CQL_FILTER=identifier='082ENE001'" .

rdfs:Resource  rdf:type      owl:Class , rdfs:Resource , rdfs:Class;
        rdfs:subClassOf      rdfs:Resource;
        owl:equivalentClass  rdfs:Resource .

<http://localhost:8080/gsip/id/MineralOccurrence/BC/082ENE001>
        rdf:type          rdfs:Resource , owl:Thing , cgdn:MineralOccurrence;
        rdfs:label        "CGDN Mineral Occurrence BC 082ENE001"@en , "RDGC Indice minéralisé BC 082ENE001"@fr;
        owl:sameAs        <http://localhost:8080/gsip/id/MineralOccurrence/BC/082ENE001>;
        schema:name       "CGDN Mineral Occurrence BC 082ENE001"@en , "RDGC Indice minéralisé BC 082ENE001"@fr;
        schema:subjectOf  _:b0 , _:b2 , _:b1 , _:b3 .

_:b3    rdf:type          owl:Thing , rdfs:Resource;
        rdfs:label        "C-B Indice minéralisé 082ENE001"@fr , "BC Mineral Occurrence 082ENE001"@en , "csv: MCKINLEY, MCKINLEY (L.140S), FRANKLIN CAMP";
        cgdn:concretizes  <http://localhost:8080/gsip/info/Infoset/BC/MineralOccurrence>;
        dct:format        "text/csv";
        owl:sameAs        _:b3;
        schema:geo        [ rdf:type          rdfs:Resource , schema:GeoCoordinates;
                            schema:Latitude   49.540833;
                            schema:Longitude  -118.3875
                          ];
        schema:name       "C-B Indice minéralisé 082ENE001"@fr , "BC Mineral Occurrence 082ENE001"@en , "csv: MCKINLEY, MCKINLEY (L.140S), FRANKLIN CAMP";
        schema:provider   <https://www2.gov.bc.ca/gov/content?id=279686BC782F47ECA7B257376391D210>;
        schema:url        "http://apps.empr.gov.bc.ca/geoserver/cgi/ows?service=WFS&version=1.1.0&request=GetFeature&typeName=cgi%3ACA-BC-BCGS-mineral-occurrences&outputFormat=csv&maxFeatures=10&CQL_FILTER=identifier='082ENE001'" .
denevers commented 1 month ago

The pages organises the dataset by "provider" (BC and GSC), so some code in the ModelWrapper is provided to pull this info. Since the Landing page can be about a real thing or an Infoset, the path to get to the dataset is either trough 'subjectOf' (isNir = true) or 'concretizedBy' (isNir = false).

/**
     * Get all the representations of the context resource that have this provider
     * @param provider
     * @return
     */
    public List<Resource> getRepresentationByProvider(Resource provider,boolean isNir)
    {

        return getRepresentations(isNir).stream().filter(m -> getProviders(m).contains(provider)).collect(Collectors.toList());
    }

    /**
     * The all the reprensentation of a specific context resource that has this provider
     * @param context
     * @param provider
     * @return
     */
    public List<Resource> getRepresentationByProvider(Resource context,Resource provider,boolean isNir)
    {
        return getRepresentations(context,getDsProperty(isNir)).stream().filter(m -> getProviders(m).contains(provider)).collect(Collectors.toList());
    }
denevers commented 1 month ago

Some code improvement and addressing vexing issue with domain name changes for ontology in commit 63153a6b030837e8418d1153c22ffdf067ad2680

denevers commented 2 weeks ago

Need to refresh documentation with latest changes

denevers commented 2 weeks ago

I realised that the ontology has changed while this issue was documented. I made changed to portion of the ticket, but the narrative misses important points. better check docs/infoset_ontology.adoc