iodepo / odis-arch

Development of the Ocean Data and Information System (ODIS) architecture
https://book.odis.org/
29 stars 17 forks source link

Issue Dataset schema.org #232

Open smrgeoinfo opened 1 year ago

smrgeoinfo commented 1 year ago

The triples generated from the example at https://book.oceaninfohub.org/thematics/dataset/index.html#id1 end up using the URI for the metadata document (https://registry.org/permanentUrlToThisJsonDoc) as the subject for triples like

permanentUrlToThisJsonDoc description "description of whats in the dataset" [its a description of the dataset, not the JsonDoc...] permanentUrlToThisJsonDoc license "license for the dataset" [not the license for the JsonDoc...]

Doug and I have recently discussed this issue in CDIF context. I'd suggest the schema.org metadats separate the statements about the metadata from statements about the resource the metadata describes, using schema:about:

{
    "@context": {"@vocab": "https://schema.org/"},
    "@type": "DigitalDocument",
    "@id": "https://registry.org/permanentUrlToThisJsonDoc",
    "includedInDataCatalog": {
        "@id": "https://registryOfCatalogs.org/permanentUrlIdentifiyingCatalog",
        "@type": "DataCatalog",
        "url": "https://urlOfDataCatalog.org"
    },
    "about": {
        "@id": "https://identifier.org/identifierForThisDataset",
        "@type": "Dataset",
        "name": "A concise but descriptive name of the dataset",
        "description": "An extended, free-text description of what's in the dataset, who created it, and other attributes",
        "url": "https://urlToTheDatasetOrLandingPage.org/",
        "encodingFormat": "text/csv",
        "sameAs": ["http://alternativeUrlToTheDatasetOrLandingPage.org"],
        "license": "This work is licensed under a  Creative Commons Attribution (CC-BY) 4.0 License",
        "citation": [
            "Citation to other work relevant to this dataset [this is not a good way to put links to related resources; people understand 'citation' differently..."
        ],
        "version": "2021-04-24T06:34:56.000Z",
        "keywords": [
            "Keyword 1",
            "Keyword 2",
            "Keyword 3"
        ],
        "measurementTechnique": "The URL to or text about the methods, technique or technology used to generate this Dataset; is measuremntTechnique specific to a variable or to all variables, e.g. teh dataset...",
        "variableMeasured": [
            {
                "@type": "PropertyValue",
                "name": "Name of a variable in the dataset",
                "description": "Extended description of this variable"
            },
            {
                "@type": "PropertyValue",
                "name": "Name of a variable in the dataset",
                "url": "http://ontology.org/uriToSemanticDescriptorOfThisVariable_Should_use_PropertyID_if_this_is_a_URI",
                "description": "Extended description of this variable?"
            },
            {
                "@type": "PropertyValue",
                "name": "SamplingDeviceApertureSurfaceArea",
                "url": "http://ontology.org/uriToSemanticDescriptorOfThisVariable",
                "description": "Extended description of this variable"
            }
        ],
        "temporalCoverage": "2007/2007",
        "spatialCoverage": {
            "@type": "Place",
            "geo": {
                "@type": "GeoShape",
                "polygon": "142.014 10.161667,142.014 18.033833,147.997833 18.033833,147.997833 10.161667,142.014 10.161667"
            },
            "additionalProperty": {
                "@type": "PropertyValue",
                "propertyID": "http://dbpedia.org/resource/Spatial_reference_system",
                "value": "http://www.w3.org/2003/01/geo/wgs84_pos#lat_long"
            }
        },
        "producer": [
            {
                "@type": "Organization",
                "legalName": "Legal Name of Organisation which generated the dataset",
                "name": "Other Name of Organisation which generated the dataset",
                "url": "https://organisationWebsite.org/"
            }
        ],
        "about": {
            "@type": "Event",
            "@id": "https://cruises.org/cruiseID_or_better_yet_an_actual_observaton_event",
            "description": "Describe the event which this dataset documents. For example, a cruise ID.",
            "name": "Concise and descriptive name of the Event",
            "agent": [
                "Name or permanent ID of person or thing that performed this action [should be a schema.org Person",
                "Name or permanent ID of person or thing that performed this action",
                "Name or permanent ID of person or thing that performed this action"
            ],
            "startDate": "2007-03-11T14:45UTC",
            "endDate": "2007-03-11T15:42UTC",
            "hasPart": {
                "@type": "Action",
                "name": "Concise but descriptive name of action that was part of an Event. For example, the name of a CTD cast",
                "instrument": {
                    "@type": "Thing",
                    "name": "The name of the instrument used in the action. For example, the specific model of a CTD, a glider, a moored sensor",
                    "url": "http://ontology.org/uriToSemanticDescriptorOfThisInstrument",
                    "description": "Extended description of the sampling instrument"
                }
            }
        }
    }
}

I've also tweaked how the event and the Action that is part of the event are serialized. Also have to consider if measurementTechnique is variable specific (it won't be in many cases), or applies to whole dataset.