Public Provision

darobin commented 6 months ago

We are thankful that you are producing this document, which we believe has genuine potential to shape better outcomes for AI and the web.

One aspect which you don’t address is the need for public provision of AI. Over the past decades we have witnessed the consequences of having many key infrastructural components of the web and of our digital lives being entirely provided by private actors; we believe that it is essential to establish the public provision of an AI commons today, before privatized infrastructure captures the market and becomes entrenched.

With this in mind, we would like to suggest the addition of a section to your document to cover this concern, maybe like the one below (happy to make a PR if that’s helpful). (Note: the fenced section is meant to match the .advisement sections in the doc.)

@thelastjosh & @darobin (Public AI Network)

A recurring issue on the web is that several of its critical infrastructure components, such as search and social, are entirely provided by a highly concentrated set of private actors who operate without accountability to their users. This brings the web out of alignment with its users’ needs and makes it challenging for the web to deliver on the ethical, human-centric elements of its mission.

As AI establishes itself as an important part of the web, we need to ensure that we do not repeat the mistakes of previous eras. The web is a commons and we need the web and web standards to support a thriving, commons-based production of AI systems. This commons-based production includes local and public provision of AI components (models, data, etc.) from such actors as territorial states, cities, or any open project with democratic governance. For that to happen, we need to ensure that such actors can support or create, at a sustainable cost, AI applications and services and that these can remain competitive over time.

The signs of a healthy commons to establish for AI on the web include:

a thriving ecosystem of AI models, not just a dominant one or two or three that serve as oracles for all of humankind,
equal access to web-scale data,
the absence of extractive methods centered on using personal data to train models, and
people having a meaningful say on how any AI that they rely on or that impacts their lives works.

Web standards can reinforce or reinstate the web's status as a public commons could (and, arguably, should) lead to a version where AI itself becomes a kind of public commons, rather than a privately operated service like a search engines or social networks. This may require resolving the tension between making the web a better place for people versus making the web a better place for computers (including AI).

One plausible role for the W3C is to establish metadata standards for AI. These could cover aspects 
such as whether the governance of a model is public, whether data that went into training it is public 
open data or has a specific national origin, or what principles guided the obtention of training data 
notably from a privacy standpoint.

Several public AI initiatives are focused on issues of language in LLMs. The W3C’s Internationalization 
activity could provide its expertise in support of such work, and could potentially help source content 
from a greater and better-labeled array of languages.

The use of personal data servers (PDS) could be developed, via web standards, in order to support 
people in training their own models using their personal data without having to provide it to a commercial 
actor. Public models could be produced specifically for this kind of usage, to help protect AI usage from 
data-centric business models.

iherman commented 5 months ago

I believe W3C's role may include developing guidelines on the conversational user interface in browsers, which still dominates the way people interact with AI LLMs today (and the inclusion of LLMs in search might reinforce this). These guidelines might address accessibility and internationalization aspects, but also issues related to provenance of the data and the underlying reasoning and, possibly, aspects of both security and privacy. (Many of these issues are already mentioned in other "advisement" sections.) Issues on ergonomy, on user interface, etc, should also get a special attention.

W3C has experience, largely, though not exclusively, due to the WAI activity, on creating such guidelines, how to manage them, present them, etc. This may become extremely useful.

(This is clearly related to #25.)

densalzmann commented 3 months ago

To "reinforce or reinstate the web's status as a public commons" would in my understanding also mean initiating recommendations for the access of non-public data securely, using, for instance, techniques like Federated Learning, Differential Privacy, and Encrypted Computation. There are as well solutions emerging in the open source community for this reason, e.g. OpenMined.

w3c / ai-web-impact

Public Provision #26

Public Provision