Propose "prospective provider" as a concept

bact commented 4 months ago

Discussion on does it necessary to have a concept for "prospective provider".

Currently, it is possible to use tech:Provider for both EU AI Act "prospective provider" and "provider", as it can be argue that its definition is solely about the provision of the technology and there's no mention of the technology availability in the market.
- tech:Provider: "Actor that provides Technology"
In the EU AI Act, Article 54a (Testing of high-risk AI systems in real world conditions outside AI regulatory sandboxes) introduces a concept of "prospective provider".
"prospective provider" is a technology actor that create an AI system but not yet places them on the market or puts the system into service. Thus they are not yet the provider according to the EU AI Act.
- Article 3 (2): "‘provider’ means a natural or legal person, public authority, agency or other body that develops an AI system or a general purpose AI model or that has an AI system or a general purpose AI model developed and places them on the market or puts the system into service under its own name or trademark, whether for payment or free of charge;"
A "prospective provider" can eventually be a provider. But until then, there will be different sets of requirements and obligations for these two roles (the two sets can be overlapped or sometimes be identical).
- In the context of Article 54a, both prospective provider and provider will have same obligations (as at every instances, they are always appeared together in the text - as "provider and prospective provider")
- Outside Article 54a, "prospective provider" appears in places that connected to Article 54a. Like ones in Article 60 (Supervision of testing in real world conditions by market surveillance authorities), Article 63b (EU database for high-risk AI systems listed in Annex III), and Annex VIIIa (Information to be submitted upon the registration of high-risk ai systems listed in annex iii in relation to testing in real world conditions in accordance with Article 54a). It always appears in the form similar to "prospective provider or provider".
- "Prospective provider" appears alone in Article 53 (AI regulatory sandboxes) and in Recital 72b. -- This is probably a use case of why we may need "prospective provider" as a separated concept from the existing "provider".
"Prospective provider" could be a subclass of tech:Actor.
- In terms of the set of actions this tech:Actor can do, I suspect that it will be identical to what "provider" has. The differences will be their allowed relationships with concepts related to regulations.
- Put it in this way, may be the "prospective provider" is just a tech:Provider with some constrains?
Does this "prospective" characteristic also observed in other tech:Actor?
Discussions on status categorisation and naming in #116 may be useful here as well.

coolharsh55 commented 4 months ago

Hi. Thanks for the well research suggestion. Makes sense to me - though this should be added to the AI Act extension and not the Tech extension. @delaramglp what do you think about this? Okay to integrate?

DelaramGlp commented 4 months ago

Thanks @bact! Great suggestion. Agree to add it to the AI Act extension.

coolharsh55 commented 4 months ago

Thanks both. I'll add aiact:ProspectiveProvider to the AI Act extension as a subclass of tech:Actor.

bact commented 4 months ago

Thanks both. Agree to have it instead in AI Act extension since it is more specific to the AI Act.

bact commented 4 months ago

For the definition of aiact:ProspectiveProvider, it could be

"Actor that is expected to provide Technology"; or
"Actor that is looking towards the future to provide Technology"

(2) may be more clear, since "expected" in (1) is not clear about who is the one who expected (whose expectation it is).

"looking towards the future" may also has less expectation.

TallTed commented 4 months ago

"Actor that anticipates they will provide Technology"?

bact commented 4 months ago

@TallTed nicely put as well. Indicates the Actor's intention, I think.

coolharsh55 commented 4 months ago

From the adating the existing Provider definition with what Art described the Prospective Provider as, we get "Natural or legal person, public authority, agency or other body that develops an AI system or a general purpose AI model or that has an AI system or a general purpose AI model developed - but has not yet placed them on the market or put the system into service" which aligns it with Provider and gives the distinction clearly. Let's use this?

bact commented 4 months ago

Thanks @coolharsh55 . Agree that will suit better in the legal context of Al Act.

coolharsh55 commented 4 months ago

Accepted in https://w3id.org/dpv/meetings/meeting-2024-05-15 (issue will be closed automatically by commit)

TallTed commented 3 months ago

There must be a way to push changes (1) in chunks smaller than "320 changed files with 107,634 additions and 89,119 deletions", as that scale is simply not meaningfully reviewable, and optimally (2) as Pull Requests, which provide easy mechanisms for suggesting changes and/or catching problems, which can here only be submitted as full-fledged PRs of their own.

coolharsh55 commented 3 months ago

Hi. Yes, that's an issue. The commit includes changes to the source templates and the compiled outputs. So we can only look at the source template to review changes first. Though there is a lot of noise as the RDF formats like xml are not ordered, leading to a different output each time they are generated. In theory, we can do separate code and output branches that will avoid this issue.

TallTed commented 3 months ago

The commit includes changes to the source templates and the compiled outputs.

It is generally recommended to not have generated artefacts/outputs under git management; only the generators and inputs. If the output does remain in git/GitHub management, it should be in distinct subdirectories that provide some hints of what they contain, especially input vs output. This helps make plain what reviewers of those documents should be looking for (e.g., does this output make sense? is there a problem with this code? is there a problem with some input(s)?).

As to the "unordered" (a/k/a "arbitrarily ordered") RDF (directed-graph-relational) data formats, I would suggest you consider producing one RDF format, which content can be ordered in various ways during its production (including ordering the inputs). RDF/XML is of remarkably little utility today, given its early prominence. I would recommend that you consider focusing on N-Triples or Turtle for single graph documents, and N-Quads or TriG for multiple graph documents. These are quite easily transformed to RDF/XML, TriG, JSON-LD, or any other RDF serialization, if and when someone needs one of those serializations.

Worth noting, since you called RDF's order out — SQL (tabular-relational) data is also arbitrarily ordered, unless one uses an ORDER BY clause in their query. It's important to mention that such an ORDER BY clause also exists in RDF's SPARQL. I can't think of a serious data manipulation tool or language that doesn't have similar functionality.

coolharsh55 commented 3 months ago

Hi. All the 'code' and generation part is in the code folder, the rest are all outputs and resources. In the near future, the idea is that code and generation part will be moved to a separate branch so only the outputs are in the main branch.

For dropping RDF serialisations, I am not in favour of doing this as best practice dictates providing convenience via content negotiation. We already have these formats and it provides convenience for adopters to use formats their implementations are set up to use. Unless best practice recommends XML is not needed any longer, I think it should be provided for content negotiation. Since the content is the same across all RDF formats, the reviewer can focus only on the their format of choice e.g. Turtle.

The randomness in XML and JSON-LD outputs is because we use rdflib and that is its behaviour (I couldn't find a way to stop this from happening).

TallTed commented 3 months ago

because we use rdflib and that is its behaviour

I think you're saying that you use rdflib to translate from one serialization to the others. That's fine. The serialization you actually generate is the RDF that others should primarily review (if they review the RDF at all), to confirm that your code is producing good output. However...

only the outputs are in the main branch

That's the opposite of best practice. The outputs should be in an artefact dump. The inputs should be in the main branch.

w3c / dpv

Propose "prospective provider" as a concept #146