spdx / spdx-3-model

Other
67 stars 42 forks source link

How to extract the NamespaceMap for SpdxDocument from RDF #557

Open maxhbr opened 9 months ago

maxhbr commented 9 months ago

Since https://github.com/spdx/spdx-3-model/pull/491 was merged, it is a task for tools to extract the namespace map out of the native namespaces present in RDF / JSON-LD. I tried that and have not yet found the right approach.

The Issue

There are many unrelated namespaces coming from RDF overhead and some from the SPDX spec, that are present even in an empty document:

DEBUG:   namespace: brick -> https://brickschema.org/schema/Brick#
DEBUG:   namespace: csvw -> http://www.w3.org/ns/csvw#
DEBUG:   namespace: dc -> http://purl.org/dc/elements/1.1/
DEBUG:   namespace: dcat -> http://www.w3.org/ns/dcat#
DEBUG:   namespace: dcmitype -> http://purl.org/dc/dcmitype/
DEBUG:   namespace: dcterms -> http://purl.org/dc/terms/
DEBUG:   namespace: dcam -> http://purl.org/dc/dcam/
DEBUG:   namespace: doap -> http://usefulinc.com/ns/doap#
DEBUG:   namespace: foaf -> http://xmlns.com/foaf/0.1/
DEBUG:   namespace: geo -> http://www.opengis.net/ont/geosparql#
DEBUG:   namespace: odrl -> http://www.w3.org/ns/odrl/2/
DEBUG:   namespace: org -> http://www.w3.org/ns/org#
DEBUG:   namespace: prof -> http://www.w3.org/ns/dx/prof/
DEBUG:   namespace: prov -> http://www.w3.org/ns/prov#
DEBUG:   namespace: qb -> http://purl.org/linked-data/cube#
DEBUG:   namespace: schema -> https://schema.org/
DEBUG:   namespace: sh -> http://www.w3.org/ns/shacl#
DEBUG:   namespace: skos -> http://www.w3.org/2004/02/skos/core#
DEBUG:   namespace: sosa -> http://www.w3.org/ns/sosa/
DEBUG:   namespace: ssn -> http://www.w3.org/ns/ssn/
DEBUG:   namespace: time -> http://www.w3.org/2006/time#
DEBUG:   namespace: vann -> http://purl.org/vocab/vann/
DEBUG:   namespace: void -> http://rdfs.org/ns/void#
DEBUG:   namespace: wgs -> https://www.w3.org/2003/01/geo/wgs84_pos#
DEBUG:   namespace: owl -> http://www.w3.org/2002/07/owl#
DEBUG:   namespace: rdf -> http://www.w3.org/1999/02/22-rdf-syntax-ns#
DEBUG:   namespace: rdfs -> http://www.w3.org/2000/01/rdf-schema#
DEBUG:   namespace: xsd -> http://www.w3.org/2001/XMLSchema#
DEBUG:   namespace: xml -> http://www.w3.org/XML/1998/namespace
DEBUG:   namespace: ai -> https://spdx.org/rdf/v3/AI/
DEBUG:   namespace: build -> https://spdx.org/rdf/v3/Build/
DEBUG:   namespace: core -> https://spdx.org/rdf/v3/Core/
DEBUG:   namespace: dataset -> https://spdx.org/rdf/v3/Dataset/
DEBUG:   namespace: expandedlicensing -> https://spdx.org/rdf/v3/ExpandedLicensing/
DEBUG:   namespace: licensing -> https://spdx.org/rdf/v3/Licensing/
DEBUG:   namespace: ns0 -> http://www.w3.org/2003/06/sw-vocab-status/ns#
DEBUG:   namespace: security -> https://spdx.org/rdf/v3/Security/
DEBUG:   namespace: simplelicensing -> https://spdx.org/rdf/v3/SimpleLicensing/
DEBUG:   namespace: software -> https://spdx.org/rdf/v3/Software/

This makes it hard to identify the manually introduced namespaces.

Question: how would one extract the part of that mapping, which was intentional and decided by the creator of the document?

I see no easy answer here.

goneall commented 8 months ago

One approach would be to filter out all namespaces that are in the standard SPDX context file. This would leave you with additional namespaces added beyond the SPDX spec.

Another approach would be to filter out all namespaces that are known to be part of property and type specifications.

I also don't think it would be an issue to include these additional namespaces in the namespace map - even though they may be redundant with the context file.

goneall commented 4 months ago

Moving to 3.1 - if there is a need to document this, we can add it in that release