w3c / dpv

Data Privacy Vocabularies and Controls CG (DPVCG)
https://w3id.org/dpv
Other
42 stars 26 forks source link

Propose "prospective provider" as a concept #146

Closed bact closed 4 months ago

bact commented 4 months ago

Discussion on does it necessary to have a concept for "prospective provider".

coolharsh55 commented 4 months ago

Hi. Thanks for the well research suggestion. Makes sense to me - though this should be added to the AI Act extension and not the Tech extension. @delaramglp what do you think about this? Okay to integrate?

DelaramGlp commented 4 months ago

Thanks @bact! Great suggestion. Agree to add it to the AI Act extension.

coolharsh55 commented 4 months ago

Thanks both. I'll add aiact:ProspectiveProvider to the AI Act extension as a subclass of tech:Actor.

bact commented 4 months ago

Thanks both. Agree to have it instead in AI Act extension since it is more specific to the AI Act.

bact commented 4 months ago

For the definition of aiact:ProspectiveProvider, it could be

  1. "Actor that is expected to provide Technology"; or
  2. "Actor that is looking towards the future to provide Technology"

(2) may be more clear, since "expected" in (1) is not clear about who is the one who expected (whose expectation it is).

"looking towards the future" may also has less expectation.

TallTed commented 4 months ago

"Actor that anticipates they will provide Technology"?

bact commented 4 months ago

@TallTed nicely put as well. Indicates the Actor's intention, I think.

coolharsh55 commented 4 months ago

From the adating the existing Provider definition with what Art described the Prospective Provider as, we get "Natural or legal person, public authority, agency or other body that develops an AI system or a general purpose AI model or that has an AI system or a general purpose AI model developed - but has not yet placed them on the market or put the system into service" which aligns it with Provider and gives the distinction clearly. Let's use this?

bact commented 4 months ago

Thanks @coolharsh55 . Agree that will suit better in the legal context of Al Act.

coolharsh55 commented 4 months ago

Accepted in https://w3id.org/dpv/meetings/meeting-2024-05-15 (issue will be closed automatically by commit)

TallTed commented 3 months ago

There must be a way to push changes (1) in chunks smaller than "320 changed files with 107,634 additions and 89,119 deletions", as that scale is simply not meaningfully reviewable, and optimally (2) as Pull Requests, which provide easy mechanisms for suggesting changes and/or catching problems, which can here only be submitted as full-fledged PRs of their own.

coolharsh55 commented 3 months ago

Hi. Yes, that's an issue. The commit includes changes to the source templates and the compiled outputs. So we can only look at the source template to review changes first. Though there is a lot of noise as the RDF formats like xml are not ordered, leading to a different output each time they are generated. In theory, we can do separate code and output branches that will avoid this issue.

TallTed commented 3 months ago

The commit includes changes to the source templates and the compiled outputs.

It is generally recommended to not have generated artefacts/outputs under git management; only the generators and inputs. If the output does remain in git/GitHub management, it should be in distinct subdirectories that provide some hints of what they contain, especially input vs output. This helps make plain what reviewers of those documents should be looking for (e.g., does this output make sense? is there a problem with this code? is there a problem with some input(s)?).

As to the "unordered" (a/k/a "arbitrarily ordered") RDF (directed-graph-relational) data formats, I would suggest you consider producing one RDF format, which content can be ordered in various ways during its production (including ordering the inputs). RDF/XML is of remarkably little utility today, given its early prominence. I would recommend that you consider focusing on N-Triples or Turtle for single graph documents, and N-Quads or TriG for multiple graph documents. These are quite easily transformed to RDF/XML, TriG, JSON-LD, or any other RDF serialization, if and when someone needs one of those serializations.

Worth noting, since you called RDF's order out — SQL (tabular-relational) data is also arbitrarily ordered, unless one uses an ORDER BY clause in their query. It's important to mention that such an ORDER BY clause also exists in RDF's SPARQL. I can't think of a serious data manipulation tool or language that doesn't have similar functionality.

coolharsh55 commented 3 months ago

Hi. All the 'code' and generation part is in the code folder, the rest are all outputs and resources. In the near future, the idea is that code and generation part will be moved to a separate branch so only the outputs are in the main branch.

For dropping RDF serialisations, I am not in favour of doing this as best practice dictates providing convenience via content negotiation. We already have these formats and it provides convenience for adopters to use formats their implementations are set up to use. Unless best practice recommends XML is not needed any longer, I think it should be provided for content negotiation. Since the content is the same across all RDF formats, the reviewer can focus only on the their format of choice e.g. Turtle.

The randomness in XML and JSON-LD outputs is because we use rdflib and that is its behaviour (I couldn't find a way to stop this from happening).

TallTed commented 3 months ago

because we use rdflib and that is its behaviour

I think you're saying that you use rdflib to translate from one serialization to the others. That's fine. The serialization you actually generate is the RDF that others should primarily review (if they review the RDF at all), to confirm that your code is producing good output. However...

only the outputs are in the main branch

That's the opposite of best practice. The outputs should be in an artefact dump. The inputs should be in the main branch.