w3c / dpv

Data Privacy Vocabularies and Controls CG (DPVCG)
https://w3id.org/dpv
Other
39 stars 26 forks source link

In 27560-records, how to identify the latest consent state? #114

Open coolharsh55 opened 10 months ago

coolharsh55 commented 10 months ago

In implementing #90 for ISO/IEC 27560 - records consent events are logged to indicate changes (or reaffirmations) in state of consent, e.g. from given to withdrawn and to given again. This results in multiple events with distinct timestamps for provenance. For a question of convenience and efficiency, how to express what is the 'latest' event and by extension the current state of consent?

This was discussed in the 2023-09-27 meeting, and the following options were proposed. Of these Option#3 was suggested as the best recourse.

Option 1: express state directly without a link to the matching event - offers convenience of directly providing the state but not provenance to indicate which event caused the state

{ "ex:ConsentRecord": {
    "dpv:hasConsentStatus": "dpv:ConsentConfirmed",
    "dct:provenance": [{
        "@id": "Event#3",
        "@type": "dpv:ConsentConfirmed"
    }, {
        "@id": "Event#2",
        "@type": "dpv:ConsentWithdrawn"
    }, {
        "@id": "Event#1",
        "@type": "dpv:ConsentGiven"
    }]
}}

Option 2: express events as a nested structure such that the later event 'envelops' the earlier ones, thus leading to the latest event always being the 'first' or 'external'.

{ "ex:ConsentRecord": {
    "hasConsentStatus": {
        "@id": "Event#3",
        "@type": "dpv:ConsentConfirmed" 
        "dct:provenance": [{
            "@id": "Event#2",
            "@type": "dpv:ConsentWithdrawn"
        }, {
            "@id": "Event#1",
            "@type": "dpv:ConsentGiven"
            "dpv:hasLocation": "https://example.com"
        }]
    }
}}

Option 3: to express the 'id' of the event that caused the change, which provides provenance to the latest event, but requires some identifier to be present for each event

{ "ex:ConsentRecord": {
    "dpv:hasConsentStatus": "Event#3",
    "dct:provenance": [{
        "@id": "Event#3",
        "@type": "dpv:ConsentConfirmed"
    }, {
        "@id": "Event#2",
        "@type": "dpv:ConsentWithdrawn"
    }, {
        "@id": "Event#1",
        "@type": "dpv:ConsentGiven"
    }]
}}
coolharsh55 commented 10 months ago

For additional context, I used 'chaining' of events in GConsent to have a path to the latest event, and left it as implementation detail for creating additional information to have direct indication of the latest consent state. See https://harshp.com/GConsent/#onto-diagram-invalidate which uses isPreviousConsentFor between the two events.

pmcb55 commented 10 months ago

Just my 2c, but I'd use PROV-O terms instead of the far more limited DCTERMS (as all it provides are dcterms:ProvenanceStatement and dcterms:provenance). But maybe you have reasons to avoid PROV-O (or these are just simplified examples)...?!

I still like the chaining of events used in GConsent, although I'd reverse the direction of the relationship to point back to the previous instance - i.e., I'd drop isPreviousConsentFor and use prov_o:wasRevisionOf to point from the new/latest event back to the previous one (i.e., so that the previous event can be thought of as being always immutable, i.e., by not having a new triple associated with it's RDF Subject).

In Option 3 above, I don't see this chaining at all, and in fact, I can't see any 'order' between "Event#1", "Event#2", and "Event#3" (since RDF values are inherently unordered). I'm guessing the example is just being deliberately simplified, but I can't see the proper provenance linkage (maybe using Turtle instead of JSON-LD would help clarify?).

coolharsh55 commented 10 months ago

Hi. That's an interesting question - thanks. Events are expected to have timestamps which specifies the order. 27560 doesn't mention anything about how events are recorded - so implementations are free to choose PROV-O or other ways to denote links or order.

For using PROV-O, the wasRevisionOf would not be the correct choice since we are talking about two events - neither is a 'revision' of the other. If it was two separate records with the later record being an updated iteration - then associating the two records with wasRevisionOf would be correct. I am also avoiding PROV-O to keep the model as simple as possible (you can still use PROV-O for your use-cases) and to match what 27560 specifies.

In DPV, we do have isBefore and isAfter without any restrictions - so that could be used to 'chain' the events. GConsent can also be used, but for DPV, the properties have a broader context and can be used for many things, including other legal basis e.g. contract events.

If the above use of DPV concepts answers your query, I will make a note to state such linking in the 27560 DPV document.

pmcb55 commented 10 months ago

Events are expected to have timestamps which specifies the order.

Ah yeah, good point. But in this case, isn't there far more than just an implicit temporal lineage - i.e., doesn't each of these events (in the context of a ConsentRecord) explicitly supersede all previous ones, which I think very much justifies an explicit relationship link too?

the wasRevisionOf would not be the correct choice

Ah - yeah, I see what you mean. I was thinking of separate Consent records revising previous ones. So yeah, I'd probably need to look properly at the overall modelling of these 'changes-of-state-of-consent' (e.g., it feels to me that each change in a ConsentRecord perhaps should be modelled as a whole new instance of a ConsentRecord, with each instance being immutable (i.e., auditable/non-repudiable) forever).

I am also avoiding PROV-O to keep the model as simple as possible...

Sure, we all want to keep our models as simple as possible, but, I'd argue, not by redefining terms that already preexist in well established W3C standards, like PROV-O.

In DPV, we do have isBefore and isAfter without any restrictions...

I'm not a huge fan of those two terms (but I'd need to ponder more). Could PROV-O's Invalidation concept perhaps work better here - i.e., each new event (or instance of ConsentRecord) basically prov_o:invalidates the previous state of the user's consent?

coolharsh55 commented 10 months ago

doesn't each of these events (in the context of a ConsentRecord) explicitly supersede all previous ones, which I think very much justifies an explicit relationship link too?

Supersede in the sense that the updates the status of consent - yes.

Could PROV-O's Invalidation concept perhaps work better here - i.e., each new event (or instance of ConsentRecord) basically prov_o:invalidates the previous state of the user's consent?

No, it doesn't invalidate - that state is still applicable for the duration the event was active for. For e.g. if consent was given on JAN-01 and withdrawn on DEC-31, then the latest status would be withdrawn but it doesn't invalidate the earlier consent - it puts an end to its applicability. Any processing that takes place within JAN-01 to DEC-31 is valid based on that consent.

it feels to me that each change in a ConsentRecord perhaps should be modelled as a whole new instance of a ConsentRecord, with each instance being immutable (i.e., auditable/non-repudiable) forever)

You can certainly create a new record for each new event - 27560 conformance does not prohibit it. But 27560 conformance does require a log of all events at that point in time be available within the consent record.

In DPV, we do have isBefore and isAfter without any restrictions

I'm not a huge fan of those two terms (but I'd need to ponder more).

We need before/after relations in the first place because graphs are not ordered - most people would be using stuff like JSON lists which maintain order. 27560 does not require the events be ordered in any way - but using the timestamps there is an implicit temporal order. So implementations can choose how they want to implement this. That's why in my initial use I didn't have any ordering relations or references, but am open to using DPV before/after or something else if it doesn't introduce additional implications for the information (so as to keep it as close to 27560 as possible).

If concepts are defined using PROV-O, it also requires the concepts to be declared e.g. as prov:Activity with prov:Agent - which is okay if you are familiar with PROV-O but the typical audience for this won't be using PROV-O and it introduces terms and information models not typical in law nor present in 27560. If someone wants to specifically use PROV-O, the current schema can be extended to create a new schema with PROV-O specific requirements.

pmcb55 commented 10 months ago

Could PROV-O's Invalidation concept...

No, it doesn't invalidate...

Sure, good point. But... if you bend your interpretation just a little bit(!), i.e., thinking of the new state of the Consent now invalidating the old state from the new event's point-in-time (i.e., 'now') forward (but only until some future event further 'invalidates' this new current state). Your still correct that any processing that occurred during that Consent-state's 'period-of-validity' is still perfectly valid processing. Yeah, I know, I'm kinda playing with the semantics, but my position is really this:

We're undeniably talking about 'Provenance' here, right? We're talking about Activities (i.e., giving Consent, withdrawing that Consent, etc.) that happen in a backward-traceable way over time. So it seems to be just classic provenance to me - i.e., there's nothing (at a core level) that is GDPR-specific about this particular provenance tracking. Therefore (in my thinking), rather than try to invent our own provenance concepts, we should instead try very hard to fit the core provenance aspects of this use-case into the only W3C standard there is for provenance - i.e. PROV-O.

And so yeah, even if that means having to bend one's perspective or interpretation of PROV-O's terms to fit the specific use-case (as I suggest above), then I think the interoperability benefits of sticking to 'the standard' massively outweighs redefining your own generic provenance-related terms.

My position does raise the obvious question though of whether PROV-O really is 'good enough' to be considered as the baseline for any-and-all provenance use-cases. I can't find any references to anyone calling for an updated PROV-O Version 2, whereas this blog (from Feb 2021) points to lots of extensions to PROV-O. All of them seem fairly minor in scope, but it does kinda imply that PROV-O itself might indeed be fit for representing the universal 'core concepts' of provenance.

...the typical audience for this won't be using PROV-O and it introduces terms and information models not typical in law nor present in 27560

Yeah, these are perfectly valid points too. But don't they apply to almost every Linked Data use-case...? I've seen it again-and-again where people want to keep their vocabs 'simple' (naturally, as we all do), and therefore often avoid reusing existing vocabs (to keep things simple), and so just re-invent the two or three terms that they need to use for their immediate, specific use-case.

But I've come to the conclusion that if you're going to adopt Linked Data at all (and that's a massive leap in itself, and already a massive 'burden' on any unfamiliar audience), then it's only a relatively small 'extra burden' to adopt standard/de-facto vocabs, like PROV-O, too (or SKOS, or gist, or QUDT, or W3C Time, or DPV(!)).

Perhaps I've gone too far off-topic here (so maybe I should create a new issue?), but I still think all of the above still applies directly to how to model the notion of 'latest consent state' - i.e., it's still just 'basic provenance modelling' :) !

coolharsh55 commented 10 months ago

Hi. We're still on the same topic - so using this issue is okay. I can get rid of that dct:provenenace statement altogether, and just use dpv:hasConsentStatus with timestamps. DCT came up because we were discussing how to point to the latest state - and sioc:latestVersion is an option as well that doesn't require provenance.

I understand your point - and yes we should use existing standards/specs where relevant - but I don't think it is necessary to use PROV-O here. Just like Time Ontology is also a standard - doesn't mean we should always use it whenever we need timestamps. Using XSD time or other forms is still okay. Its great that we have standards - but it doesn't mean we use them everywhere without considering if they are needed and are useful. I have tried using PROV-O and expressing consent before with GDPRov - it is NOT trivial and I had to be very considerate of interpretations. I think the modelling in GConsent is MUCH more sensible and useful and which is what DPV also follows.

We can also ask why doesn't ODRL use PROV-O to describe activities and entities that parties agree to work with - and the answer would be because they target different requirements. You can creatively interpret ODRL obligations and rules as activities (with agents) - but that is not necessary. Same applies here.

coolharsh55 commented 10 months ago

For context on use of PROV-O - I had used GDPRov to depict consent records in terms of PROV activities and outputs. It looks okay in an example [1], but when using it with the rest of GDPR information creates differences in terminology and requires understanding of PROV to understand what is being expressed. Using GConsent in the same example [2] results in a model closer to what is typically represented for GDPR. DPV's consent concepts try to avoid shortcomings of both of these in terms of being consistent with what it the domain terminology as well as non-consent use-cases and information, and by being flexible and not restricting use to specific vocabularies.

[1] https://harshp.com/research/publications/035-representing-activities-processing-personal-data-consent-semweb-gdpr-compliance.html#sec:gdprov:use-case:SPECIAL [2] https://harshp.com/research/publications/035-representing-activities-processing-personal-data-consent-semweb-gdpr-compliance.html#sec:gconsent:use-case:SPECIAL

coolharsh55 commented 10 months ago

Discussed in meeting 11 OCT and the group agreed that it is better to not include a way to indicate the latest state and to let implementations decide how they want to maintain the state - (see below alternatives). The information to derive the latest state will be present in the record/receipt via the timestamps.

It is possible to express the events as records (which can also be references to other Consent Records), and to use dpv:hasConsentStatus to indicate only the latest consent status.

{ "ex:consent-record": {
    "dpv:hasRecord": [
        { 
            "@type": "dpv:ConsentGiven",
            "dct:date": "timestamp"
        }, { 
            "@type": "dpv:ConsentWithdrawn",
            "dct:date": "timestamp"
        }
    ]}}

Otherwise, the alternative is to not have the latest status at all (similar to 27560) and only provide the events and let implementations determine what is the current status. Personally, I like having an explicit interpretation of what the current state is - as it helps document and share understanding of whether consent is being used for processing or not. So my preference leans towards using dpv:hasRecord to point to records (of consent events, or other consent records) and dpv:hasConsentStatus to indicate the current status. This also works nicely for receipts.

coolharsh55 commented 9 months ago

I have gone with not specifying a direct explicit way to identify the latest consent state, and instead listing all the 'consent events' with their timestamps - as required in ISO/IEC 27560:2023. See https://harshp.com/dpv-x/guides/consent-27560#consent-event-fields for the current draft.

I would still like to find some way to indicate the 'latest status' for consent in a declarative manner rather than requiring querying or parsing of all information. E.g. given a consent record, it would be nice to have a simple mechanism akin to traversing record -> latestStatus.

Creating a new property called hasLatestStatus would not be ideal as DPV has lots of different statuses - all of which can be argued to be relevant to have latest variants. AFAIK the issue about having a latest status only comes about when there are multiple statuses within a record/graph. A manageable alternative is to have a property called hasPastStatus to indicate it is NOT the current status - which would require making the record mutatble, and then moving the older statuses under this while the latest status stays under hasStatus.

coolharsh55 commented 3 months ago

@pmcb55 I think I have a good solution for this by using DCAT v3, which supports indicating versions and last items in a collection. So we'd have something like this (simplifying for the example):

ex:CR a dpv:ConsentRecord, dcat:Resource ;
    dcat:hasCurrentVersion ex:CR-3 ;
    dcat:hasVersion 
        ex:CR-1, # requested
        ex:CR-2, # given
        ex:CR-3, # withdrawn
        CR-4 .   # given (again)

ex:CR-1 a dpv:ConsentRecord, dcat:Resource ;
    dcat:isVersionOf ex:CR ;
    dct:issued "2024-01-01"^^xsd:date ;
    dpv:hasConsentStatus dpv:ConsentRequested .

ex:CR-2 a dpv:ConsentRecord, dcat:Resource ;
    dcat:isVersionOf ex:CR ;
    dcat:hasPreviousVersion ex:CR-1 ;
    dct:issued "2024-02-15"^^xsd:date ;
    dpv:hasConsentStatus dpv:ConsentGiven .

ex:CR-3 a dpv:ConsentRecord, dcat:Resource ;
    dcat:isVersionOf ex:CR ;
    dcat:hasPreviousVersion ex:CR-2 ;
    dct:issued "2024-03-20"^^xsd:date ;
    dpv:hasConsentStatus dpv:ConsentWithdrawn .

ex:CR-4 a dpv:ConsentRecord, dcat:Resource ;
    dcat:isVersionOf ex:CR ;
    dcat:hasPreviousVersion ex:CR-3 ;
    dct:issued "2024-04-26"^^xsd:date ;
    dpv:hasConsentStatus dpv:ConsentGiven .

(Edit to add dcat:DatasetSeries example which is more 'semantically correct' if each state/record is an independent event and not a version of the same event)

ex:CR a dpv:ConsentRecord, dcat:DatasetSeries ;
    dcat:first ex:CR-1 ; # <--- start of events 
    dcat:last ex:CR-4 ; # <--- last update - instead of dcat:hasCurrentVersion

ex:CR-1 a dpv:ConsentRecord, dcat:Resource ;
    dcat:inSeries ex:CR ; # <--- changed from dcat:isVersionOf
    dcat:next ex:CR-2 ; # <--- ability to point this isn't the 'latest' one
    dct:issued "2024-01-01"^^xsd:date ;
    dpv:hasConsentStatus dpv:ConsentRequested .

# CR-2 and CR-3 follow the same pattern

ex:CR-4 a dpv:ConsentRecord, dcat:Resource ;
    dcat:inSeries ex:CR ;
    dcat:prev ex:CR-3 ; # <--- changed from `dcat:isVersionOf
    dct:issued "2024-04-26"^^xsd:date ;
    dpv:hasConsentStatus dpv:ConsentGiven .

In this way, the records can be stored together in a single record as parts of that record, or even separately (e.g. immutable records) as updates to the earlier record. The central/main record has a way to point to the latest state / event for the activity. This also maps well with PROV if needed (since DCAT v3 has mappings to PROV).

Querying is also simple - look for records that are not objects of dcat:hasPreviousVersion relation as they will be the latest ones. E.g.

SELECT * WHERE { 
    ?s a dpv:ConsentRecord .
    FILTER NOT EXISTS { ?x dcat:hasPreviousVersion ?s } .

Additionally, if we're using DCAT (v3), it makes sense to take full advantage of this and jump both feet in. The records are resources, and a receipt is a 'catalog of resources' i.e. it can provide multiple records within it. DCAT also means you can use existing portals/catalog tools. So to me this makes a lot of sense in terms of being clean, clear, and practical.

(Edit: okay, this also maps super nicely onto GConsent! So any existing uses/models based off it are compatible with what is described here.)

ghurlbot commented 1 month ago

Comment by @coolharsh55 via IRC channel #dpvcg on irc.w3.org

beatriz to follow up on this from local implementation

besteves4 commented 3 weeks ago

Hi @coolharsh55, small question, in the example in the guide a record has distinct consent status. But here we want to associate a record with a status in a one-to-one relationship right? S that then use DCAT's v3 to establish the order of the records. Would such an implementation still be aligned with ISO 27560?

coolharsh55 commented 3 weeks ago

@besteves4 interpreting your question as whether can add statuses within individual records and bundle them together using a catalog or another resource - the answer is yes. 27560 considers this as operational details in an implementation along as we can get the information defined in 27560.

So we can have one record with multiple "events" representing different consent states, or separate records for each event.