Should this document also consider data on the web?

azaroth42 commented 5 years ago

The document asserts that the architecture of the web is browser based, yet there are many other classes of user agent and many other documents not intended for browsers that are published on the web. These web resources also generate ethical considerations, including many which are already covered.

The document acknowledges that ...

there are a raft of other technologies, standards, languages and APIs that come together to form the "web platform."

But then only discusses browser user agents:

The architecture of the browser-based web is built from a user agent, the browser, [...]

The first principle is

There is one web.

Yet this document divides the one international web into two classes -- the web of HTML documents for browsers, and ... the web of data that isn't discussed.

Another principle reinforces this divide:

The web is multi-browser, multi-OS and multi-device.

Saying Multi-browser is not the same as saying multi-agent. There are many non-browser agents or clients that consume data published on the web.

Thank you for your consideration of the issue!

hadleybeeman commented 5 years ago

YES! Yes it should. Thanks for flagging. We need to find a way to fix this.

I'll admit, I'm struggling with the vocabulary here. I definitely agree that "multi-browser is not the same as saying multi-agent"... and "multi-agent" or "client" is accurate, but I'm concerned that it doesn't sound descriptive enough. We're trying to write for a broad technical audience here, not just spec authors from the web of data or the browser-based world... So what's a good, plain-English way to explain what we mean?

danbri commented 5 years ago

There's a larger issue here, replacing "web of data" with "web content, services and platforms" in Rob's question. How far beyond "web platform standards" does TAG seek for this document to have impact and relevance? Reading the document I get mixed messages.

The core focus seems to be "the Web as a technology stack" aka "web platform" with a strong browser-oriented emphasis, but in various places the scope feels larger, as if talking about all Web content, applications and platforms built for and in the Web. For example, "We will write specs and build platforms" (who will build what? is "we" the TAG or an allusion to a broader vaguer "we"?), also "It also serves to raise awareness of the ethical responsibilities of web makers" seems to use the broader sense of "web", i.e. web content/services/platforms not just the Web technology itself.

Yet this document divides the one international web into two classes -- the web of HTML documents for browsers, and ... the web of data that isn't discussed.

I don't see the document making that division, but it suffers from vagueness regarding a related one: using "the Web" to mean the core technology, vs "all that stuff we express via the technology".

We know from Microdata and RDFa and other efforts that HTML content is data. I see nothing in the finding that excludes W3C's data-oriented RDF, XML etc. specs from being covered. It's more that two things are in tension: a) the document is growing out from a currently browser platform -centric TAG, b) the document is concerned with high level principles. Because of these two things it often talks in sweeping terms that suggest grander goals, while at the same time being somewhat tied to the technology platform aspect. This makes it hard sometimes to understand how broadly the principles are expected to be applicable, for example "The web must make it possible for people to verify the information they see" is fine and great when talking about e.g. browser UX for padlocks, or conventions for when it is reasonable to display an URL in the URL bar; but the "must" feels strained when considering less browser-oriented concerns like misinformation at the level of ideas and propaganda rather than at the level of digital signatures and computer protocols. I fear the broader social aspirations (which many of us share) weaken the more specific applicability of the principles to the immediate environment of Web platform UX.

Saying Multi-browser is not the same as saying multi-agent. There are many non-browser agents or clients that consume data published on the web.

Yes - for example Search engines (the more the merrier). (To complicate the picture, some but not all search engines run browsers headlessly server-side as part of their processing of Web content.)

My sense is that the document takes values that many of us in the Web standards community share, and squeezes them into the shape of a TAG Finding which doesn't entirely suit them. W3C technology specifications are by their very nature rather indirect and multi-purpose, and so it is natural for us to want the principles to be applied more directly at the point where Web standards are used in real life (requiring "awareness of the ethical responsibilities of web makers", or engagement from platforms where the misinformation-vs-censorship debates play out, etc.).

On Rob's point about the "Web of data", can I suggest a test cases for one of the principles? Imagine that the W3C PICS Working Group (https://www.w3.org/PICS/ - defunct since 1995-1997) have proposed PICS as a content filtering component for in Web services (caches, portals, forums) and in browsers, and are checking their designs against these principles. What does the principle "The web must enable freedom of expression" have to say about the practicalities of that PICS design? At some level PICS was pluralistic (multiple label providers, labeling schemes, filter rules, label bureaus). But considering "should not enable state censorship, surveillance or other practices that seek to limit this freedom", it probably still fails there. How would the "freedom of expression" principle here practically help to guide a group (re)inventing PICS, any more than just saying "It's complicated and there are fundamental tradeoffs"? Perhaps mapping out the landscape of tradeoffs is as useful as framing things in terms of principles...?

(sorry this was so long!)

azaroth42 commented 5 years ago

@danbri wrote:

using "the Web" to mean the core technology, vs "all that stuff we express via the technology".

Yes! This! Thank you Dan :) I implied two completely separate "webs", documents vs data, but what I meant to be saying was that the focus on browsers to the exclusion of other consumers of content provided via web technologies unintentionally created an artificial dichotomy that is not there in reality.

Perhaps mapping out the landscape of tradeoffs is as useful as framing things in terms of principles...?

And I agree with this as well, in conjunction with the principles. As with all of our horizontal review checklists/questionnaires/discussions, the principles are not orthogonal but require consideration in the context of the specific technology and must be balanced against both each other and the functionality needed for the technology to be useful.

To @hadleybeeman's question of how to be more inclusive of non-browsers, how about something like:

The architecture of the web is designed with the important notion of different classes of application that retrieve and process content, and represent the needs of its users. This includes web browsers, but also web-hosted applications such as search engines and software that performs some activity on behalf of a user given more structured data as content. This lends itself well towards this more ethical approach by allowing the person using the web to choose a browser, search engine or other application that best meets their needs (for example, with strong privacy protections).

Thanks for your engagement with the issue :)

masinter commented 5 years ago

For such a document published by the W3C TAG, the scope should cover technology in scope for the TAG. The scope of the TAG includes both technology published by W3C, as well as (through the TAG liaison function) that of other organizations (IETF, ECMA, ICANN, etc) whose specifications are normatively referenced.

I don't think it makes a lot of sense for this document to cover things not in scope for the TAG or to leave out technology that is in scope.

If you (the TAG) want to disambiguate the various uses of the academic "we" first person pronoun, I'd suggest replacing "we" with "the TAG" (as the authors of this document), 'the W3C" (as the group expected to pay attention to findings), "the web community" (as the desired audience) or some other term.

hadleybeeman commented 5 years ago

We're coming back to this, and @torgo and I are really grateful for your comments here. This is helpful indeed!

If all are content, we propose to replace this paragraph in the introduction:

The architecture of the browser-based web is built from a user agent, the browser, that represents the needs of its users and works with application developers to deliver against them. This lends itself well towards this more ethical approach by allowing the person using the web to choose a browser that best meets their needs (for example, with strong privacy protections).

...with this paragraph, slightly modified from @azaroth42's wording for simplicity:

The architecture of the web is designed with the important notion of different classes of application that retrieve and process content, and represent the needs of its users. This includes web browsers, web-hosted applications such as search engines, and software that performs some activity on behalf of a user given more structured data as content. This lends itself well towards this more ethical approach by allowing the person using the web to choose a browser, search engine or other application that best meets their needs (for example, with strong privacy protections).

azaroth42 commented 5 years ago

Looks great, thank you so much for the engagement with this :)

hadleybeeman commented 5 years ago

Pull request with suggested edits. Thanks for everyone's help with this!

hober commented 5 years ago

This includes web browsers, web-hosted applications such as search engines, and software that performs some activity on behalf of a user given more structured data as content.

I think this sentence is overly broad. Specifically, the third category seems like it describes all software. For example, Microsoft Word “performs some activity on behalf of a user given more structured data as content” every time you use it to edit a Word document, but surely that is out of scope here.

danbri commented 5 years ago

@hober 's point goes to the heart of my discomfort with the document's positioning.

As we discussed earlier, we have both an expansive sense of "the Web" as a distributed information system in which arbitrary parties access information in arbitrary formats (pdf, Flash, whatever) over a variety of protocols (even .doc files via ftp:// and gopher:// are still on the Web in the expansive sense). But there's also a specific technical platform, the "web platform", with APIs, protocols, data formats, UX etc. whose specs are engaged with by the TAG, and whose canonical application is the modern Javascript-enriched form of the classical "Web browser". The data-on-the-web agenda somewhat crosses these lines by trying to make the Web's content more programmatically manageable, without particular concern for the browser as a context for doing so.

Phrases like "The web must be for good." seem much better applied to the latter "web platform" sense of "Web", as a reminder to the Web's engineers that their work could have unintended consequences. The larger sense of the Web as its content (the words in pages; the recommender algorithms and content-blocking policies hidden inside the backends of its most successful sites; the behaviour of its popular software libraries; ... "and software that performs some activity on behalf of a user") seems beyond the TAG's immediate remit. As a general, universalist, platform, the Web also needs to support usecases (at the content/site level) which are far from good.

Just as Turing machines and blank sheets of paper and knives and maybe even JSON are beyond good and evil, the Web, in its expansive sense, is ethically agnostic. It is ethically agnostic because it is for everyone, not just for nice people like us. But that doesn't mean that its engineers and the authors of its evolving standards get a free pass from considering the human impact of their technical decisions; privacy, security, accessibility, inclusivity etc. This is a weird and awkward tension, and an easily politicized one. Many of the ethically unsatisfying aspects of the current Web are beyond the scope of the TAGs core business, but it feels like the document has an under-articulated goal of aligning itself with wider trends, e.g. some sort of solidarity with the "we won't build it" tech-worker movement (https://www.wired.com/story/why-tech-worker-dissent-is-going-viral/ etc.), or with the application of similar ethical principles to the creation of "socially net positive" sites and content.

One way of doing that would be to acknowledge the difficulties in drawing ethical lines, or in scoping such guidance, and noting that where the Principles section says "the Web should/is/must...", in many of those cases, the burden of doing the right thing on the Web is often split between standards engineering as happens around W3C and TAG, and Web content/site/platform design which happens out there in the world of Web content. There is a flow over time too, in which ideas from the latter work their way into the former, i.e. standardization. The TAG has more natural authority in the former but might benefit from solidarity with people applying similar principles to site and service content design. Many of the principles in section 2 are on topics that are closer to Web content than to Web platform, e.g. "The web must support healthy community and debate", "The web must make it possible for people to verify the information they see" than to Web platform engineering. But there is also a path from Web content into the core platform as things - like notions of identity, or popular JS APIs - get standardized. When ethical decisions are made in the domain of Web content (sites/platforms, popular software packages) there is some kind of pluralism; different sites can handle things differently. When the platform itself tries to tackle those issues, there's less scope for pluralism since the standard needs to work for everyone.

A sentence like "The web should be a platform that helps people and provides a net positive social benefit." is aspirational, but the sad fact is that now that so much of society has moved to operate through the Web, then "the Web" is also a system that facilitates bullying, bribery, propaganda, hate-mongering, and a thousand other forms of mischief and malice. It helps these things, and many others, happen faster and more efficiently, and at scale, and it is often beyond us to do much but hope that the positive consequences of the Web ultimately outweigh them. I feel the document somehow lacks an acknowledgement that is is part of the Web's very essence that it has to provide a universal platform for even socially toxic content, and that there are smaller practical ethical steps that Web standards developers can be taking that do more harm than good, even while building a system within which ugly things have to somehow have a home.

Ugh that was long again, sorry. I really want to like this document, but it still feels somehow off-balance in tone and scope. I guess I'm not the intended audience so should bow out for now!

hadleybeeman commented 4 years ago

Thanks, @danbri. You very much are in the intended audience, so we appreciate your thoughts.

You're right that the text in this document is aspriational -- but that's the point of it. We're trying to improve the work within the W3C community, and its ability to mediate against some of those challenges.

You're right that the web does facilitate bullying, bribery, propaganda, hate-mongering, etc. And we know that some of the decisions we make in writing specs can exacerbate those behaviours -- or mitigate against them. We want to make it as easy as possible to recognise the potential for encouraging those behaviours, and (ideally) to help authors make choices that won't make things worse.

hadleybeeman commented 4 years ago

@hober: Good point with this:

This includes web browsers, web-hosted applications such as search engines, and software that performs some activity on behalf of a user given more structured data as content.

I think this sentence is overly broad. Specifically, the third category seems like it describes all software. For example, Microsoft Word “performs some activity on behalf of a user given more structured data as content” every time you use it to edit a Word document, but surely that is out of scope here.

We propose changing that to:

This includes web browsers, web-hosted applications such as search engines, and software that acts on web resources.

As per the Data on the Web best practices, this should include data: "a resource may be a whole dataset or a specific item of given dataset."

Would that work for all of you?

hober commented 4 years ago

@hadleybeeman Yes, I think that works for me.

hadleybeeman commented 4 years ago

Addressed in pull request and merged at TAG Cupertino face-to-face.

w3ctag / ethical-web-principles

Should this document also consider data on the web? #4