w3ctag / ethical-web-principles

W3C TAG Ethical Web Principles
https://w3ctag.github.io/ethical-web-principles/
Other
68 stars 17 forks source link

Participating in the Open Web with the Expectation You Won't Be Part of Data Harvests #109

Closed cgrobb closed 3 months ago

cgrobb commented 7 months ago

The web is the ultimate digital venue for sharing professional works. Professional artists especially use the open web to share samples of their works as a mechanism for a) getting more work and b) being discovered by other artists who want to collaborate. The professional "work" here is often a raster or vector web format that has its own https URI(s).

Over the past few years, data harvesters have been harvesting these professional works for use as digital raw materials for generative AI models whose outputs are commercial, at an unprecedent scale. The quality and completeness of the outputs of these commercial systems depends primarily on the harvested inputs.

The issue is well-framed by the top resources in this Google query: https://www.google.com/search?q=robots.txt+abuse+generative+AI

Given the scale of the economic harms, I was then surprised that this doc had no discussion/inclusion of intellectual property/licensing/robots.txt abuse and the discussion of data rights doesn't address professional creative works as data.

I'd like to see the doc incorporate professional data rights and the right to opt out of data harvesting as fundamental ethical web principles and reference any related standards work.

rhiaro commented 7 months ago

Thanks for raising this. We discussed this in our breakout today, and we agree that what you describe is a harmful practice, both for end users and the integrity of the web platform as a whole. At the moment we think that a discussion of IP/copyright specifically is too low level for the EWP, but it connects to broader discussions we've been having recently about the web being used for harmful exploitative/extractive practices in general. We anticipate that we will at some point write something on this topic (like a Finding) that goes into more detail, which we can then link to from an existing EWP (for example, "does not cause harm" or "enhances individual control and power").

cgrobb commented 7 months ago

Thanks for the quick reply. Good to know of the broader discussion.

I'm (just now) seeing the existing W3C work on policy expression (https://en.wikipedia.org/wiki/ODRL).

Here's a resource that frames data rights as human rights and conversely: https://www.regulations.gov/comment/COLC-2023-0006-10317 "This would lead to the closing up of the web as organizations protect themselves in other ways, the disappearance of revenue streams for many worthwhile jobs (like Artist or Journalist), and the loss of all human rights included within data rights."

csarven commented 7 months ago

There is also the W3C CG work on the Data Privacy Vocabulary :

[..] enables expressing machine-readable metadata about the use and processing of personal data based on legislative requirements [..]

torgo commented 3 months ago

We've agreed to close this as it's out of scope for this specific document. We think we've addressed this issue to some degree in the Privacy Principles - specifically in data rights. As @rhiaro mentioned, we may also pick this up for a future finding.