w3c / dpv

Data Privacy Vocabularies and Controls CG (DPVCG)
https://w3id.org/dpv
Other
45 stars 27 forks source link

Preserving older versions of DPV and other resources #45

Closed coolharsh55 closed 4 months ago

coolharsh55 commented 2 years ago

Currently the older versions (e.g. DPV, extensions, documentation) can only be accessed through git commits or by using older releases. This means they are not accessible through IRIs or online.

To remedy this, older versions can be provided through the iris /v/X.x where X.x refers to a specific version, e.g. 0.7.1. To implement this: the folder path /v needs to be created, then each version copied inside a directory named for that version's number. This is easier to do using scripts similar to those used for releases. The older releases would then be accessible through the existing IRI scheme as https://w3id.dpv/v/0.5 for DPV or https://w3id.dpv/v/0.5/dpv-gdpr for DPV-GDPR, and so on. The folders will contain their HTML documentation, and the rest of non-DPV resources (e.g. documentation generator, primer) will not be versioned in the same manner to avoid replicating the entire repo.

A caveat here is the increased space taken up by older requirements. On average, a DPV release may be approx. ~40MB in size. So as versions build up, the space taken can quickly cross 1GB. As a strategy, only the last iterations for each major version would be supported in this manner. For example, if DPV had releases 1.1.1 and 1.1.2 before moving to 1.2, then only 1.1.2 would be made available. (note that here semantic versioning refers to MAJOR.Minor.fixes where MAJOR refers to significant changes across all of DPV, Minor refers to addition or changes in some parts.

coolharsh55 commented 1 year ago

Another option is to deposit the documents as they are published with w3c/cg-reports which will maintain and provide live versions.

coolharsh55 commented 5 months ago

See https://lists.w3.org/Archives/Public/public-dpvcg/2024Jun/0001.html

What is versioned IRIs / namespaces?

The URL for using DPV is https://w3id.org/dpv - which doesn't indicate which version of DPV is being used. By default, it will always point to the latest version i.e. what is published on github. To allow adopters to continue to use specific versions, e.g. v2 when there is a v3, we want to create separate IRIs/URLs.

What will IRIs look like?

We continue using w3id (strongly recommended) and have:

w3id.org/dpv - always the latest version w3id.org/dpv/legal/eu/gdpr - latest version of gdpr w3id.org/dpv/v1 - v1 w3id.org/dpv/v2 - v2 w3id.org/dpv/v2.1 - v2.1 w3id.org/dpv/v2/tech - v2 tech w3id.org/dpv/v1/legal/eu/gdpr - v1 gdpr

How will the repo be structured?

Currently, we have all our stuff directly in the root folder - https://github.com/w3c/dpv To have versions, we follow the structure similar to what the DCAT-AP repo uses: https://github.com/SEMICeu/DCAT-AP/tree/master/releases

So our DPV repo will look like:

root
 |-- v1
 |-- v2
   |-- dpv
   |-- pd
   |-- legal etc.
 |-- primer
 |-- guides
 |-- examples

What will be versioned?

Definitely the vocabulary i.e. HTML and RDF files will be maintained for each version. Guides and other documents will be not be versioned - they will remain where they are.

I see no value in maintaining separate copies of the Primer and other guides - as part of the publication process an archived copy is available via the w3c publishing process e.g. https://www.w3.org/community/reports/dpvcg/CG-FINAL-primer-20221205/ is the Primer from 2022.

How will this be maintained?

The size of each DPV release is approx. 100-150MB counting everything in it. Without all the media and other things, this can be reduced. So we should be okay to continue hosting this on Github for a while (>10 releases).

Another important question is whether we do any minor releases e.g. v2.0.1 e.g. to fix typos or add in a few concepts. I am not in favour of this as it is an added burden to create a release without a lot of value. So far now, I am only thinking we have major releases i.e. v2, v2.1, and so on. And these will be versioned. If its an emergency or urgency to create a small change, then we will do minor releases e.g. v2.0.1 with a crucial fix.

thovden commented 5 months ago

It's a bit unclear what v2 means when you also have v2.1. Will v2 always be the latest version of 2.x? Instead, I suggest using v2.0, v2.1 and so on, and not use v2. Alternatively, we need both v2 and v2.0 to avoid confusion.

coolharsh55 commented 5 months ago

Agreed. Using semantic versioning of form <major>.<minor> always is best practice. We'll do 1.0, 2.0, 2.1 and so on.

coolharsh55 commented 5 months ago

5432b00 implements this change and restructures the repo to have v1.0 and v2.0 folders containing the releases.

coolharsh55 commented 5 months ago

We'll be going ahead with this if no issues are identified by Friday JUN-21.

thovden commented 5 months ago

I assume the data exports (JSONLD, CSV, etc) will use the specific IRIs and namespaces for the resource - e.g., https://www.w3id.org/dpv/ai#AI for the latest version in , https://www.w3id.org/dpv/v2.0/ai#AI for the latest v2.0 version and so on?

coolharsh55 commented 5 months ago

I assume the data exports (JSONLD, CSV, etc) will use the specific IRIs and namespaces for the resource - e.g., https://www.w3id.org/dpv/ai#AI for the latest version in , https://www.w3id.org/dpv/v2.0/ai#AI for the latest v2.0 version and so on?

Yes, though the version-less IRI will always resolve to the latest version - v2.0 instead. So ai:AI is defined with IRI https://www.w3id.org/dpv/v2.0/ai#AI, and https://www.w3id.org/dpv/ai#AI will resolve to this v2 IRI now, v3 in future, and so on.

nuthub commented 5 months ago

do we need to update (some of) the examples to reflect that dpv without version refers to latest dpv? So v1(v2) examples need to be fixed to explicitely refer to v1(v2), which is not the case, currently, I think.

coolharsh55 commented 5 months ago

do we need to update (some of) the examples to reflect that dpv without version refers to latest dpv? So v1(v2) examples need to be fixed to explicitely refer to v1(v2), which is not the case, currently, I think.

I thought about that, but then when we update DPV next time the examples would use an older version (IRI) - so as of now the example are without IRIs and used in documents, linked to concepts in RDF. If the example becomes outdated, we create a new one and link that to the document and concept in future. Makes sense?

nuthub commented 5 months ago

Indeed, you already patched the files (e.g. dpv/examples/dex-owl.owl) I was thinking of, after I checked the wrong branch.

coolharsh55 commented 5 months ago

Yes, the examples index file dex.ttl and dex-owl.ttl contain versioned IRIs but the examples themselves don't. So do we continue with the current setup or change it to something else?

coolharsh55 commented 5 months ago

I found https://more.metadatacenter.org/recommended-iri-patterns-ontologies-and-their-terms which gives some peace of mind that the approach we are taking to versioning IRIs and using unversioned IRI to always point to the latest version is sensible.

nuthub commented 5 months ago

I dare to highlight a sentence that might reduce the peace of mind again:

(Note: For this approach to be unambiguous, the version can never begin with an alphabetic character, and the resourceIdentifier can never begin with a digit. MMI COR enforces this by convention, and hopes no one will break it.)

coolharsh55 commented 5 months ago

Should we replace v2.0 with 2.0 then? (should be doable)

coolharsh55 commented 5 months ago

Done - we have dpv/2.0 instead of dpv/v2.0 everywhere (IRIs, folder names) now. @thovden FYI for this change.

coolharsh55 commented 5 months ago

The best practice here recommends that we do the following:

  1. Use the base IRI for the ontology metadata e.g. https://w3id.org/dpv
  2. Use versioned IRI with owl:versionIRI e.g. https://w3id.org/dpv/2.0
  3. For terms do not use versioned IRI and use the base IRI i.e. https://w3id.org/dpv#Purpose is correct, and https://w3id.org/dpv/2.0#Purpose is wrong

Source: Garijo, D., & Poveda-Villalón, M. (2020). Best practices for implementing fair vocabularies and ontologies on the web. In Applications and practices in ontology design, extraction, and reasoning (pp. 39-54). IOS Press. https://arxiv.org/pdf/2003.13084

coolharsh55 commented 5 months ago

I have implemented the change as follows:

# previously
@prefix dpv: <https://w3id.org/dpv/2.0#> .
<https://w3id.org/dpv/2.0> a owl:Ontology,
  owl:versionIRI <https://w3id.org/dpv/2.0> ;

# changed to
@prefix dpv: <https://w3id.org/dpv#> .
<https://w3id.org/dpv> a owl:Ontology,
  owl:versionIRI <https://w3id.org/dpv/2.0> ;

With this there are no terms with IRIs containing version information - as per best practice. To be merged into main branch after making sure everything works as expected. With this change, the FOOPS! score should increase as well.

coolharsh55 commented 5 months ago

Implemented the change. FOOPS! test passed for version IRI. Only remaining tests are for depositing DPV in LOV (which I'll do once we have a final release), and for the prefix in prefix.cc (which can only be fixed by voting)

coolharsh55 commented 4 months ago

Submitted all current vocabs to LOV (awaiting email from the system).

coolharsh55 commented 4 months ago

DPV added to LOV: https://lov.linkeddata.es/dataset/lov/vocabs/dpv

coolharsh55 commented 4 months ago

Deposited on Zenodo. DOI for all releases: 10.5281/zenodo.12505840