isamplesorg / metadata

Collation of metadata examples and notes for the project
https://isamplesorg.github.io/metadata/
7 stars 2 forks source link
linkml metadata physical-specimen

NSF-2004562 NSF-2004815 NSF-2004839 NSF-2004642

metadata

Defines the core metadata model for iSamples.

src/schemas/iSamplesCoreSchema.yml defines the iSamples core model in linkml. It references vocabularies contained in src/vocabularies/ which define terms for the Material Type, Sampled Feature, and Specimen Type vocabularies.

The following artifacts are generated from the linkml and vocabulary sources:

Development

Linkml and associated tools require a python environment, version 3.9 or newer, and uses poetry for dependency management. Poetry can be installed with pip install poetry.

To work on project contents and run artifact generators, first grab the source and switch to the develop branch:

git clone https://github.com/isamplesorg/metadata.git
cd metadata
checkout develop
pull

Setup a virtual environment (e.g. using poetry or mkvirtualenv):

poetry shell
poetry install

(To exit poetry shell, use exit).

Artifacts in the generated/ folder are produced by running make or make all.

Documentation is rendered with [Quarto]() rather than the defaults mkdocs or Sphinx (Quarto offers many additional features for including computed examples which are planned). To generate the documentation, install a version of [Quarto >= 1.2](), then run make, make all or make gen-docs.

This will generate markdown intermediate files in the build/docs folder then invoke quarto render to generate the HTML docs in the docs/ folder.

Note that this project uses a version of the linkml docgen tool and templates modified to render markdown for quarto. The modified docgen and templates is located in the tools/ folder.

Older notes below

Collation of metadata examples and notes for the project

linkML (Current version 1.1.15)

This branch implments how to use linkML to generate various output and operations for iSamples.

Current workflow (01/01/2022)

workflow

iSamples YAML schema to JSON schema

We could use the following command to convert iSamples YAML schema to JSON schema.

gen-json-schema -t PhysicalSampleRecord --not-closed iSamplesSchemaBasic0.3.yaml > iSamplesSchemaBasic0.3.schema.json 

In this command, -t PhysicalSampleRecord means to make "physicalSampleRecord" class become the top level class. And the prepoerties of the class become the top level properties in the JSON-schema. The converted JSON scheme file is "iSamplesSchemaBasic0.3.schema.json".

Generating JSON-LD context

gen-jsonld-context iSamplesSchemaBasic0.3.yaml > iSampleSchemaBasic0.3.jsonld

The command will save the result in the jsonld file. After we have the converted JSON-LD context. The enumeration part of JSON-context should be modified by us manually.

Modified JSON-LD context example
   "@context": {
      "dct": "http://purl.org/dc/terms/",
      "isam": "http://resource.isamples.org/schema/",
      "mat": "http://resource.isamples.org/vocabulary/material/",
      "pur": "http://resource.isamples.org/vocabulary/samplepurpose/",
      "rdfs": "http://www.w3.org/2000/01/rdf-schema#",
      "sf": "http://resource.isamples.org/vocabulary/sampledFeature/",
      "skos": "http://www.w3.org/2004/02/skos/core#",
      "spt": "http://resource.isamples.org/vocabulary/specimentype/",
      "w3cpos": "http://www.w3.org/2003/01/geo/wgs84_pos#",
      "xsd": "http://www.w3.org/2001/XMLSchema#",
      "@vocab": "http://resource.isamples.org/schema/",
      "curation": {
         "@type": "@id"
      },
      "hasContextCategory": {
         "@type":"contextcategory"
      },
      "hasMaterialCategory": {
         "@type":"materialtype"
      },
      "hasSpecimenCategory": {
         "@type":"specimencategory"
      },
      "id": "@id",
      "latitude": {
         "@type": "xsd:decimal"
      },
      "location": {
         "@type": "@id"
      },
      "longitude": {
         "@type": "xsd:decimal"
      },
      "producedBy": {
         "@type": "@id"
      },
      "relatedResource": {
         "@type": "@id"
      },
      "resultTime": {
         "@type": "xsd:date"
      },
      "samplingSite": {
         "@type": "@id"
      }
   }

This is an example of modified JSON-LD context. For each enumeartion, we use @type to declare enumeration type.

Validating schema and instance file

Before we valideting all instance files, we need to add modified JSON-LD context to the front of instances properties.

Full instance example
{
   "@context": {
      "dct": "http://purl.org/dc/terms/",
      "isam": "http://resource.isamples.org/schema/",
      "mat": "http://resource.isamples.org/vocabulary/material/",
      "pur": "http://resource.isamples.org/vocabulary/samplepurpose/",
      "rdfs": "http://www.w3.org/2000/01/rdf-schema#",
      "sf": "http://resource.isamples.org/vocabulary/sampledFeature/",
      "skos": "http://www.w3.org/2004/02/skos/core#",
      "spt": "http://resource.isamples.org/vocabulary/specimentype/",
      "w3cpos": "http://www.w3.org/2003/01/geo/wgs84_pos#",
      "xsd": "http://www.w3.org/2001/XMLSchema#",
      "@vocab": "http://resource.isamples.org/schema/",
      "curation": {
         "@type": "@id"
      },
      "hasContextCategory": {
         "@type":"contextcategory"
      },
      "hasMaterialCategory": {
         "@type":"materialtype"
      },
      "hasSpecimenCategory": {
         "@type":"specimencategory"
      },
      "id": "@id",
      "latitude": {
         "@type": "xsd:decimal"
      },
      "location": {
         "@type": "@id"
      },
      "longitude": {
         "@type": "xsd:decimal"
      },
      "producedBy": {
         "@type": "@id"
      },
      "relatedResource": {
         "@type": "@id"
      },
      "resultTime": {
         "@type": "xsd:date"
      },
      "samplingSite": {
         "@type": "@id"
      }
   },

    "@schema": "../../iSamplesSchemaBasic0.2.json",
    "@id": "metadata/21547/Car2PIRE_0334",
    "label": "PIRE_0334",
    "sampleidentifier": "ark:/21547/Car2PIRE_0334",
    "description": "",
    "hasContextCategory": ["Marine Biome"],
    "hasMaterialCategory": ["Organic Material"],
    "hasSpecimenCategory": ["Whole Organism"],
    "informalClassification": ["Gastropoda"],
    "keywords": ["Aceh", "Sumatra","Indonesia","Asia", "Mollusca"],
    "producedBy": {
        "@id":"ark:/21547/Cas2INDO_2016_SEU_1B",
        "label": "INDO_2016_SEU_1B",
        "description": "expeditionCode: INDO_PIRE | samplingProtocol: ARMS | taxonomy team: MINV | projectId: 80",
        "hasFeatureOfInterest": "coral reef",
        "responsibility": ["Aji Wahyu Anggoro","Andrianus Sembiring"],
        "resultTime": "2016-08-09",
        "samplingSite": {
            "description": "Shallow, coastal reef. Apparent exposure to current, Porites dominated. Less impacted bleaching site, high recruitment, 12 m.",
            "label": "",
            "location": {
                "elevation": "maximumDepthInMeters: 12",
                "latitude": 5.89430,
                "longitude": 95.25293
            },
            "placeName": ["Pulau Seulako"]
        }
    },
    "registrant": "Chris Meyer",
    "samplingPurpose": "genomic analysis",
    "curation": {
        "accessConstraints": "",
        "curationLocation": "",
        "responsibility": ""
    },
    "relatedResource": {
        "label":"subsample tissue",
        "description":"",
        "target":"ark:/21547/Cat2INDO106431.1",
        "relationship":"subsample"
    }
}

We need to use the following command to validate our instance files with schema.

linkml-validate -s iSamplesSchemaBasic0.3.yaml instance.json
jsonschema -i instance.json iSamplesSchemaBasic0.3.schema.json

The first command is to validate instance file with yaml schema. The second command is to validate instance file with json schema.

Run tools in a Docker container

The iSamples Metadata Docker container is based on the Docker container from the LinkML project [https://hub.docker.com/r/monarchinitiative/linkml/tags]

First you'll build the image: docker build -t isamples_linkml .

Then, running it will open a bash shell opened to /work, which is the Docker container volume representing the iSamples metadata repository: docker run -a stdin -a stdout -i -t -v `pwd`:/work isamples_linkml

Then use the following commands to generate LinkML:

To do