Open cwilso opened 2 years ago
This seems related to the old debate about deep linking. Under what circumstances should you (a) copy-paste from a site or (b) link-and-frame or (c) summarize and refer? It can be very hard to know what URLs into a site are 'intended' to be citable/linkable by others, what snippets can be 'excerpted by reference'. I'm not sure I can see my way to a general principle here. The reality of today's web is different from the assumptions (if ever explicit) under the initial design.
There seems a lot of nuance here that needs teasing out; the 'information wants to be free' movement would probably have useful comment.
AB agreement this is worth continuing to iterate on.
Another part of the "word cloud" is "creative control", or "moral rights of authors". Those might lead to shorter explanations. That would be good, because pasting the original content would immediately convince me to split this into two documents.
I think there are limits to the extent we believe in this principle. Framing it as "the right to have one's content forgotten" or "the right to limit deep linking" might help understand where the conflicts can arise.
from the original comment creating the issue:
The Web is built on the expectation that information about the online behaviour of users as it takes place on a given Web property, being the property that users intended to interact with, must remain under the exclusive control of that property's operator, which alone must determine the means and purposes of its processing.
I think there is an expectation among some of those who built the web - the users who choose to make it a place where that's good to build - that they have a say in the control and ownership of information about their online behaviour. I doubt that's really controversial as a principle, but working out how to make it operationally useful as a statement seems likely to be a complex and important discussion.
I also wonder if this is part of W3C's values, or is actually a more technical issue that we would expect WGs to deal with and to be described by the TAG, or if it has aspects that fall on either side of any such arbitrary divide.
I really struggle with this one. I absolutely agree that respecting the publisher is important. But the web was also built on the value that one should be able to refer to anything – that we can build any 'thread' of the web that serves.
There is an underlying bug here, of course. We use the same mechanism to include resources (such as images, style sheets, scripts) of our own, as to refer to or even include the resources of others: it's all URLs. This means that if I want to put an image on my own site, I have to have a URL form for it to be able to include it in my own pages – but now that URL form is available to others for them to include without any of the context or attribution that I have on my site. Ugh.
Overall, we have two values bumping into each other : the web should be linkable, it should be possible to refer to (rather than say, copy-paste) other material; and publishers and owners should have reasonable control over how and where their material is presented to users. These are not always easy to reconcile. We had publishers years ago insisting that they only wanted URLs to link to the very top-level of their site, which would disable the ability to link to a specific story, which would defeat the linkability of the web.
(And that gets me to another peeve, which is a tangent here: if I want to point at a specific part of a story, only some documents can tell me where the nearest anchor is; generally, you'd have to read the HTML source to find if there is an anchor you can point at, grrr.)
Where that property's operator works with third parties, those third parties must only ever be service providers to that operator and must not be able to independently reuse data obtained from providing their service. No other monitoring on online behaviour is compatible with the well-being of the Web.
The above statement has to be left to each Web property to decide, not the W3C. Anything else is to place the W3C in a position where it is interfering with competition. The W3C can't use it's position in technical standards to attempt to address perceived specific market failure.
However the W3C could consider developing technical standards for the inclusion of licences agreements and payment mechanisms for content. Some copyright and IP holders might be okay for their content to be used in AI training algorithms, or to create other derived works. Others may not. Creating a technical standard to automate these permissions at scale might be compatible with the W3C.
There's an important point in @jwrosewell's comment above regarding the information about how people interact with the web (which is valuable information). I don't think that we should be trying to decide whether the web "prefers" monolithic / centralised providers of an end to end service, or many providers collectively creating a service for users. Both seem like a perfectly legitimate approach to building something for people.
But it is important to relate this to transparency, and complexity.
For a human, it is hard to understand both the potential scope of use that a very large organisation might make of data as a "first party", and what happens to data when bits of it are shared around a number of different organisations.
I think that there are plenty of good reasons for us to want a web that is supported by a healthy ecosystem that incorporates multiple players, has variety of supplier and competition, and so on. It's healthy to have choice and competition, it's good for everyone. There is a natural tendency in the information tech sector to use the 'large player'; so thinking when we write standards whether there are ways to weaken that advantage and encourage diverse ecosystems can very much be a goal, I think.
I think that it might be useful to expand on the context that underlies this principle and to provide some historical contrast between the web not very long ago and today.
The initial web architecture was built with a very strong sense that a server would have full authority over its domain. (I provide some context about that based on the research I made to guide IPFS in helping fix the web's problems; I stuck to a short selection but there's a lot more if you dig through TAG archives and Tim's design notes.)
To get a sense for how strong that expectation was, the action of minting a name in someone else's domain (like robots.txt
, favicon.ico
) was referred to as expropriation. And they didn't mean it in an emotional, hyperbolic way — they called it expropriation in the AWWW because that's exactly what they meant: taking someone's property. (The TAG's archives and other similar sources support the understanding that this was a strongly held sentiment.) To make the point clearly: in 2004 expropriating a name from a website it was considered a sufficiently concerning issue to command hours of work from some of the web's best experts.
Twenty years later, if you're a typical consumer-facing business (ie. not big tech, infra, startup… but rather commerce or publisher), which is arguably one of the most important demographics in terms of making things for people and also one of the least represented in W3C, it is expected that you must provide:
(Of course, you can ignore all of the above and hope that people will type in your URL or sideload your app. Good luck!)
I'm certainly forgetting things, but this is the baseline expectation. We've certainly come a long way in 20 years if this isn't considered confiscatory expropriation by even the most timid speaker. The background for this issue is that if you consider these demands, it shouldn't be surprising that web sites are struggling to build experiences that retain users — where are you supposed to find the time and money to support building something good on top of the above?
If we want a thriving web, we should make it so that people who make things on the web can actually thrive. It is completely unrealistic to expect anything to make the web thrive so long as most of the above statements remain true. I believe that this is a mission-level objective (if only because given the current confiscatory system, there won't be a web left otherwise).
I'm more than happy to consider any and all parts of my formulation as not capturing the issue well, but the idea is to put some muscle behind the "authors" part of the constituencies. (I think it's telling that some of the most powerful entities are not represented in the Priority of Constituencies. If we were to document current practice, search engines and app stores would be more important than users.)
Now, a couple notes on this:
I wonder if it would be helpful to look at this in the context of provenance and attributions, especially in distributed and syndicated content. This is not a new issue at all. As @darobin explains, this is a major concern for publishers (book publishers, news publishers, bloggers, musicians, things that get embedded into generative AI, etc) as well as for many others, including users who would like to understand the origins of materials. Ensuring proper attribution and provenance can be a tool against misinformation as much as it can be helpful in making sure that authors get paid. Perhaps https://www.w3.org/2001/tag/doc/distributed-content/ is a good starting point for discussion.
@darobin provides a nice summary of some of the market failures associated with the web that I recognize. I'd also add;
But no matter how long the list there's no mandate for the W3C to usurp the role of regulators. The Antitrust Guidelines prohibit this.
We certainly do need to apply our Antitrust Guidelines consistently and assist regulators where we can.
If there's a willingness we might also wish to work with other bodies to develop technical standards to assist all web stakeholders understand the origin and rights associated with copyrighted material and intellectual property as I believe @TzviyaSiegman suggests. This approach could also be applied to the processing of personal data. The reason to work with other bodies is to ensure that the solutions are applicable across digital platforms and not just the web.
- Your content, not in exchange for traffic but also so that someone else can monetise it in voice assistants and AI systems.
- Your audience data via ads/marketing, not because it's needed to make ads on your site (or ads you place elsewhere) successful but in order to support MFA sites and intermediaries in making money.
- Labour in producing a host of formats (AMP, Apple News, etc.) so that other people can build products that benefit them.
- Labour in producing metadata just to make it easier for other people to build products (schema.org).
- Labour in improving your site's performance even if it's not near the top of issues that your actual users need.
- Your audience data via browsers that have cleartext sync defaults (Chrome, I don't know if Edge still does) so that it can improve (at least) a search engine that competes with you for ad money.
- 30% of your sales in app stores (even if your app uses web technologies, and it draws from the same budget even if it doesn't)
- 30-70% of your ad revenue.
- Random features that some product person you never met decided have to be universal (eg. making your crosswords app dark mode even though only a tiny fraction of your users end up using that).
I don't think this captures the nuances; people don't work on the content of their site "just to make it easier for other people to build products", that would be worse than pointless. There is a positive reason that they take these steps: to gain audience, and so on. Some of them are not even anything to do with the web. Some are no different than what's been done for ages ("as reported by the Thames Tribune, xxx…").
I quite appreciate the overall goal here, and that the web platform has features (like the inability to distinguish reference from inclusion, both of which use URLs) that compound the problem. But.
@darobin I think I get where you're coming from, but characterising this as "sovereignty" is too absolutist, and invokes some uncomfortable comparisons with current events/trends (both political and social).
Protocols are agreement; the Web is agreement;[^psd] they require cooperation. What you're concerned about, I think, is when the terms of that agreement are lopsided against everyday users. So, I think this relates to the priority of constituencies, but needs to go further, because it involves parties and powers beyond those listed.
What concerns me about your original writeup above is that it can be read to advocate data silos, where we build a Web that, for example, precludes RSS.
Maybe that's a direction to pursue: what qualitative difference is there between RSS and "Labour in producing a host of formats (AMP, Apple News, etc.) so that other people can build products that benefit them"?
You mention one very technical aspect that I do agree with -- that "a server would have full authority over its domain." That's enshrined in RFC8820, which is IETF Best Current Practice.
[^psd]: As Paul Downey captured so well.
@darobin I think I get where you're coming from, but characterising this as "sovereignty" is too absolutist, and invokes some uncomfortable comparisons with current events/trends (both political and social).
+1
I recommend we re-use existing characterisations, such as the one in the TAG finding "Distributed and syndicated content" which refers to
primacy of URLs and origins on the web
[...] So, I think this relates to the priority of constituencies, but needs to go further, because it involves parties and powers beyond those listed.
Looking at the Vision document in its current form, it would be tricky to edit existing principles or add such a new principle (because the document is high-level and what Robin suggested to cover is very specific). But if the AB continues to want us to iterate on it, and our group concurs there's merit in crafting an operationally useful statement, then I believe the various facets of that statement are (in rough logical order):
@TzviyaSiegman Provenance may provide a small part of the solution, but I don't think that it helps in the mission and values space?
@dwsinger I'm not sure which nuances aren't captured? That feels like hinting at a problem with my argument without actually specifying it. Which ones aren't related to the web?
I'm sorry but people don't work on the content of their site "just to make it easier for other people to build products", that would be worse than pointless. There is a positive reason that they take these steps: to gain audience, and so on.
simply doesn't describe the reality of what being a content producer on the web is. It's a bit like saying "people don't pay protection money 'just to make mobsters rich' that would be worse than pointless. There is a positive reason they take these steps: to stay alive, to avoid arson, etc." You may think that this is an exaggeration but the fact of the matter is that 1) very few sites do that kind of work voluntarily, 2) they don't benefit from the work at all, at best they maintain a status quo, and 3) when you put it all together it adds up to a lot of work. Even if you think that some of that work might be useful (the web needs to evolve) we have to acknowledge that extracting ever more from the same people in exchange for the exact same returns isn't sustainable — this is very basic maths.
It's clear that the situation won't improve if left to its own devices. We might not have easy ways to ensure that the people who build the web can push back on these unilateral impositions and excessive rent, but that doesn't mean that it shouldn't be taken into account.
And I don't think that being dismissive about what is a very concerning situation helps.
@mnot I'm perfectly fine with changing the name — in fact I'm pretty sure that we already agreed to do that, though I don't think anyone had suggestions. My usage of sovereignty is loosely Montevidean, but I'm not married to it.
You're entirely right that the web is an agreement, which in turn requires cooperation. I am in general concerned that this agreement is lopsided against everyday users but in this issue my concern is that it is lopsided against what the constituency lists as "publishers." Specifically, I think that it is lopsided to the point at which it exceeds most publishers' ability to produce value on the web.
Framing this in terms of the constituency, I do think that a change is needed: it's a problem that the most powerful entities on the web (various forms of intermediaries) aren't mentioned there at all. I think this reflects just how deeply this community is blind to the web's power dynamics. Maybe this could usefully be extracted into its own separate issue?
Your RSS question is a good one. There are plenty of reasons that publishers would benefit from something like RSS (it's a great way to bring people back!) and I am convinced that we can build a refreshed feed system that's great for people and publishers. That can only happen if publishers have voice (they bring requirements and help design the system) and, ideally in rare cases, exit (which means that they might make RSS users unhappy by not supporting it, but they won't be arbitrarily punished in eg. search, social, etc.). One way to think about it is that publishers would support something for its intrinsic value and not for an arbitrary carrot or stick.
@koalie Note that the initial context of the issue wasn't copied over here. The primary point was to produce an example of a principle that could be sufficiently specific to be foundational. That's why this is less high-level that the rest of the document. I still think that point holds, though I think it is now more or less consensus.
It's not that publishers - many of whom are in fact individual users, are not producing masses of value. It is just that most of it is captured - as you note in the next paragraph @darobin, largely to a somewhat small set of intermediaries, to the extent that some of this already small band have gone from more or less nothing to among the richest organisations in history, in fairly short order.
(But then, protection rackets normally rely on there being a set of producers who have the money to funnel to the mobsters...)
@chaals True, but in this case I mean it in the stronger sense: the capture is reaching such levels that the ability to produce any value at all is compromised. This includes rent extraction of course but also captured work imposed "for their own good." Either way, I don't see a path forward other than giving that constituency the power to be taken into account.
Thanks for issue @darobin. Given the short form of this document how would you recommend wording this? We do intend to create a strategic objectives document that will go along with this in the future.
@dwsinger I'm not sure which nuances aren't captured? That feels like hinting at a problem with my argument without actually specifying it. Which ones aren't related to the web?
well, you say for example
Labour in producing a host of formats (AMP, Apple News, etc.) so that other people can build products that benefit them.
I am quite sure that publishers don't enable AMP or Apple News solely so that Google or Apple can make more money; that would be stupid. They do it because a significant chunk of their intended audience want it, or to enhance their brand and visibility, and so on. There is both incentive and cost/disincentive here, as in many of your examples. That's what I mean by nuance.
I don't think flippantly equating other business with mobsters helps, either, or failing to explain and explore the complexities of the situation helps. Please be more careful of your analogies?
I don't disagree that there are issues here; there have been issues back to the dawn of the web (the old 'deep linking' discussion is one of them). For example, somewhere along this spectrum we cross a line:
I am sure there are more examples.
Other aspects of the nuance here are
Sometimes the pushback goes rather far; in the deep-linking debate, some publishers wanted it to be impossible to refer to a specific story or paragraph, and only to link to the top-level of their site. Not surprisingly, people found references like "this inspiring event" with a link to ThamesTribune.com to be rather … unhelpful.
I am quite sure that publishers don't enable AMP or Apple News solely so that Google or Apple can make more money; that would be stupid. They do it because a significant chunk of their intended audience want it, or to enhance their brand and visibility, and so on. There is both incentive and cost/disincentive here, as in many of your examples. That's what I mean by nuance.
In that case, I can readily address that.
Apple created a News product that it shipped by default to its massive fleet. Because of how defaults work, this immediately created a new audience platform that competed with existing ones. The terms that Apple offered publishers were dismal. This presented publishers with a prisoner's dilemma: the best outcome for them would be if no one agreed to go to Apple News because doing so is a net loss of revenue, brand equity, audience knowledge, advertising scale (which matters a lot), direct relationship, product development… However, since AN was capturing traffic away from other channels, whoever defected and signed onto the platform would see a burst of traffic relative to the others. And just as is the case in the prisoner's dilemma, publishers couldn't cooperate to reach the better outcome: it's (ironically) considered anticompetitive in many jurisdictions. The net result is simple: the introduction of Apple News was a net negative for pretty much any publisher individually and destructive to the media industry as a whole. No one thought that this was better for their audience, revenue, or user experience. It's a substitutive product. The only reason AN has content is because Apple did what platforms do: extract rent by weaponising competition against others.
As a media executive who has been directly involved with peers on these very topics in policymaking contexts and who was physically in the room over several years when many of those decisions were made, I'm happy to provide a whole lot more nuance about these aspects. I can fill in a lot of lively detail because I remember much of it very well. It's just that, in my experience, the people who work for those companies don't find it collegial to see that nuance shared. Hence the kinder, gentler, sanitised "labour so that other people can build products that benefit them." It's the gist of the truth with less finger-pointing, if you will.
I can do AMP too, if you want, but I think @cwilso has heard enough about that one.
If you go back to what I wrote, you'll see that I didn't equate another business with mobsters: the comparison is of analogies, not of businesses. Saying that businesses chose to publish through Apple News because they wanted the audience is like saying that people pay mobsters because they want arson insurance. The analogies are the same in how they relate to reality.
I'm not sure what to make of the whataboutism of what you then say about publishers. Nowhere do I say that publishers are always right or should be on top of the food chain.
the introduction of Apple News was a net negative for pretty much any publisher individually and destructive to the media industry as a whole
I struggle to understand what W3C could do even as a matter of principle here. Yes, the web has facilitated much disruption of the news industry's traditional business models and power structures ... much of it for the worse IMHO. And the "information wants to be free" web culture, along with W3C's non-neutrality toward PATENT royalties as a business model, accelerated the decline of many long standing businesses (such as licensed software) that touch the web. But W3C has traditionally been neutral ground where members with competing businesses and different business models find common ground on some things while competing/disagreeing/counter-lobbying on others.
No one thought that this was better for their audience, revenue, or user experience.
I have to disagree about user experience. The user experience of Apple and Google News is simply better than most media industry websites, many of which are full of annoying popups and links to malvertising.
Perhaps the BigTech News sites don't pay back a fair share to for the content they republish.... but $33 million/year https://www.reuters.com/business/media-telecom/new-york-times-get-around-100-million-google-over-three-years-wsj-2023-05-08/ doesn't seem trivial. Maybe that's still a net negative for publishers, I don't know.
As we've discussed in Issue 90, W3C is NOT "neutral" on matters of principle, but strives to be neutral with respect to the different parties in a discussion, their cultures/languages/abilities, and the range of legitimate ways they make money. Maybe there's a fundamental issue of principle here, but I'm not seeing it yet. Maybe something about the need for attribution / acknowledgement as content is shared? But the discussion in this issue seems more about money than attribution 🤷🏽♂️
I think it's worth paring down this discussion because it seems to be going a bit all over the place.
The motivations for the issue are:
Can the W3C fix all of the web's problems? I doubt it. We can't make a website put its users first, for instance. However that doesn't keep us from documenting a principle and abiding by it where we can.
Of note, some of these imposed terms include aspects that we would broadly agree pursue a valuable goal (eg. performance) and could arguably be standardised or the subject of W3C work (eg. CWV). I'm using this example because I think it illustrates well how good intentions can go wrong: a given change on its own might be fine but the aggregate of all work (and value) extracted from just one constituency can nevertheless be damaging. And imposed terms means no backpressure to keep demands realistic.
What I propose we do:
First, I think that we need to add intermediary and discovery actors to the Priority. It might be less catchy, frankly it's not that catchy. Having a priority of constituencies that doesn't list the most powerful actors is at best a bug and at worst revealing of the power dynamic. There are several ways to do that, I propose we finesse that elsewhere.
Beyond that, I think my answer depends on how the rest of the document evolves. The initial proposal was also to illustrate what having more detailed principles (à la EWP) would look like. If we go that route, then I think that this proposal can be evolved into a Voice & Exit
principle in which the W3C must give voice to the constituencies impacted by a decision and to the extent possible empower them to exit imposition. If we go the route of a tighter, more focused document then it depends on what that document's focus is. The idea could get subsumed into a wider governance/checks-and-balances aspect or, if it's out of focus, pushed out to strategic considerations. It's hard to tell at this point, but at tomorrow's meeting we're looking at suggestions from @LJWatson that I think could help make that decision.
PS: I can't comment on the NYT/Google deal, but I'll say this: the systematic, lopsided extraction of value from publishers which I described above is enough of a problem that publishers are considering models that are just apps+aggregation with licensing revenue. We're probably at some distance from that still but those conversations are taking place (and GAI search pushes in the same direction). I would like to suggest that the prospect of publishers dropping off the web (or even just limiting their presence to top-of-funnel) and understanding the drivers behind that trend is something that this group might want to consider more seriously than some of the replies could indicate.
I agree we need to focus this discussion. I don't think anyone feels that there is no problem here: I don't, to be clear. The ability to co-opt other site's material onto one's own is a problem, for a start. But I don't think that the principle is 'origin sovereignty' as that would imply that origins get control over links into and quotes of their material, neither of which are in accord the design of the web or societal principles, so there is something more nuanced to be teased out.
@darobin suggested this in https://github.com/WebStandardsFuture/Vision/pull/37. His proposed text for addition is replicated below.
Origin Sovereignty
Origin sovereignty is the principle according to which the operator of a given Web property (traditionally listed as the "author" though these operator entities vary widely in nature) should be the sole and exclusive controller of information pertaining to that Web property (typically a Web site, often mapped to a domain or origin). Such information covers two primary aspects: the content that is published on that property, and knowledge of the audience for that domain where the audience is the set of users who intentionally interacted with that property. Origin sovereignty leads to two primary considerations for how the Web must work:
These considerations ensure that website publishers can operate a business without unfair competition from other parties that can impose tracking, force their content into aggregated experiences, or rely on their control of infrastructure such as operating systems, ad serving, online portals, or user agents to unfairly extract knowledge about others' audiences.
Exceptions can be considered to sovereignty when legitimate systems of collective governance exist that can support the common good (eg. a data trust of appropriately anonymised online behaviour in support of greater end-user security).