w3c / dpv

Data Privacy Vocabularies and Controls CG (DPVCG)
https://w3id.org/dpv
Other
42 stars 25 forks source link

Provide vocabulary to specify purposes and permissions related to AI training #82

Open scottkellum opened 1 year ago

scottkellum commented 1 year ago

I’m not sure this is the correct place to file this issue, but I would love for some standardized way to disallow my content (writing, photography, code) from being used in AI training data.

I’m imagining an extension of robots.txt where I can explicitly disallow crawlers that search for AI training data.

OR some sort of standard way to indicate copyright permissions and training usage being included might also be helpful, but probably more complicated.

Ultimately I want people to find my work, but I don’t want it to end up in an AI model for others to make things in the style of my work.

coolharsh55 commented 1 year ago

Hi. Thanks for the proposal. This is an interesting application that I/we hadn't forseen. Of existing vocabs, I think ODRL would be a good option to specify machine-readable licenses to indicate what is permitted and prohibited, and schema.org might suffice to specify types of contents (e.g. images, videos). What is then left is specifying purposes such as training an ML model - for which I don't think there are existing vocabularies.

In DPVCG, we are interested in expanding the DPV to more regulations - such as the EU's AI Act where such purposes are relevant. So I think this can function as an use-case towards the development of AI relevant vocabularies including purposes. For example, https://w3id.org/AIRO#training is a concept from @DelaramGlp's work on AI related risk management that refers to the training phase in AI development lifecycle. In DPV, this can be a category of purpose. You and others are welcome to help with these efforts, or provide such purposes, or have more use-cases/examples.