w3c / feedvalidator

W3C-customized version of the feedvalidator (forked from https://github.com/rubys/feedvalidator/)
Other
81 stars 37 forks source link

Copyright concerns on from RSS 2.0 Specification page #106

Closed AJ-Ianozi closed 1 year ago

AJ-Ianozi commented 1 year ago

Bringing this to W3C's attention, though I'm not the copyright holder. It concerns this page: https://validator.w3.org/feed/docs/rss2.html

This was initially brought up by Dave Winer, original author of the spec:

The W3C has a copy of my RSS 2.0 spec on their website. # At the bottom of the original document it says: "The author of this document is Dave Winer, founder of UserLand software, and fellow at Berkman Center." Below that is a Creative Commons license which says: "The content found on this site is made available under the terms of an Attribution/Share Alike Creative Commons license."# All of that is missing from the W3C's copy of my spec. # It's pretty obvious that it's my writing. # My name has been removed, as has the original license. # That's wrong. They must fix that. My recommendation, just point the original document and forget about hosting a copy. The Harvard website is not going anywhere. #

The discussion made it to Hacker News where more context was provided by user "skilled":

It did have a mention of "© Copyright 1997-2002 UserLand Software. All Rights Reserved." up until late 2021, https://web.archive.org/web/20211130062529/https://validator.w3.org/feed/docs/rss2.html and then shortly after (Feb 2022, the new page) doesn't have any of that at all, https://web.archive.org/web/20220130145338/http://validator.w3.org/feed/docs/rss2.html But it never had the phrase "Dave Winer" anywhere, not in the old version of the page or the new one.

It appears that the wording of the copyright page removing UserLand Software was changed in early 2022 via PR #68.

rcaden commented 1 year ago

I am the chairman of the RSS Advisory Board, the organization credited in the final line of the RSS 2.0 specification that the W3C is republishing:

"This document is authored by the RSS Advisory Board and is offered under the terms of the Creative Commons Attribution/Share Alike license, based on an original document published by the Berkman Center for Internet & Society."

We have published the specification for 20 years, making 10 revisions over that span. We've always offered it under the Creative Commons Attribution/Share Alike license, so any other entity is free to redistribute the document under the terms of the license.

The W3C is following the license correctly and using our preferred attribution.

The current version of the specification is available here:

https://www.rssboard.org/rss-specification

We also publish an RSS Best Practices Profile under the Creative Commons Attribution/Share Alike license, a guide for software developers and web publishers implementing RSS:

https://www.rssboard.org/rss-profile

scripting commented 1 year ago

I explained the problem, with links and screen shots in this blog post.

Net-net: You're hosting a document I wrote, with the credit and copyright removed.

BTW, I am a former member of the W3C and worked on the SOAP protocol in a group you hosted and am proud of the credit I got for that work. Screen shot. It's a parallel situation. It's as if someone removed my name from that spec, and the copyright notice below it.

I think you'll agree that RSS 2.0 was a significant contribution to the world wide web and I'm continuing to build on this work every day. And I'm just as proud of that work as I am with my contribution to SOAP and other W3C projects.

scripting commented 1 year ago

BTW, here are the details of the publication of the spec on the Harvard website on July 18, 2003.

http://scripting.com/2003/07/18.html#rss20News

AJ-Ianozi commented 1 year ago

So I think the real question is whether or not rssboard is attributing David's work properly?

From what I can tell, the last version of the spec released by rssboard to attribute Dave was RSS 2.0 (version 2.0.1-rv-6):

This document is authored by the RSS Advisory Board and is offered under the terms of the Creative Commons Attribution/Share Alike license, based on an original document published by the Berkman Center for Internet & Society at Harvard Law School authored by Dave Winer.

Version 2.0.8 and onward changed the attribution to:

This document is authored by the RSS Advisory Board and is offered under the terms of the Creative Commons Attribution/Share Alike license, based on an original document published by the Berkman Center for Internet & Society.

I did a copyscape compare comparing Dave's original page to the latest specification and the spec is still a ~90% match: Compare Two Web Pages or Articles - Copyscape.pdf

According to the CC license linked on both pages:

If supplied, you must provide the name of the creator and attribution parties, a copyright notice, a license notice, a disclaimer notice, and a link to the material. CC licenses prior to Version 4.0 also require you to provide the title of the material if supplied, and may have other slight differences.

Dave's page explicitly lists the author, which to me would be who to attribute as the author as defined by the CC license:

The author of this document is Dave Winer, founder of UserLand software, and fellow at Berkman Center.

Shouldn't the attribution provide that information rather than just saying "based on an original document published by the Berkman Center for Internet & Society"? It should probably also include a link to the original source.

rcaden commented 1 year ago

The Berkman Center for Internet & Society owns the copyright of the RSS 2.0 specification and released the document under the Creative Commons Attribution/Share Alike license in 2003.

The credit line of our version of the document changed in 2006 because Dave Winer did not want to be associated with the RSS Advisory Board or its publication of the specification. He had resigned from the board two years earlier and did not agree with the board's decision to continue operation.

In the last 17 years no one has asked us to change the credit line, including Winer or UserLand Software (which gave the copyright to Berkman).

AramZS commented 1 year ago

The Berkman Center for Internet & Society owns the copyright of the RSS 2.0 specification and released the document under the Creative Commons Attribution/Share Alike license in 2003.

The credit line of our version of the document changed in 2006 because Dave Winer did not want to be associated with the RSS Advisory Board or its publication of the specification. He had resigned from the board two years earlier and did not agree with the board's decision to continue operation.

In the last 17 years no one has asked us to change the credit line, including Winer...

I'm not really clear about the broader situation, but if an author is asking for attribution to be re-added to a work considered to be under a CC attribution license--which is what appears to be happening here--even if they had been considered to have wanted it removed in the past, I think it's good form to re-add them? I'm sure there's an argument to be made over what is or is not required, but that's different from good practice.

rcaden commented 1 year ago

I have submitted a pull request to address Dave Winer's request to be attributed in the spec.

scripting commented 1 year ago

W3C please reject this request.

Here are your choices:

  1. Remove the document from your site and point to the original.
  2. Make a copy of the original to host on your site.

Anything else is just prolonging the perhaps understandable mistake you made.

I would much prefer the first, to get the W3C out of this mess, and let us return to working to make the web work better.

rcaden commented 1 year ago

I'm not really clear about the broader situation, but if an author is asking for attribution to be re-added to a work considered to be under a CC attribution license--which is what appears to be happening here--even if they had been considered to have wanted it removed in the past, I think it's good form to re-add them? I'm sure there's an argument to be made over what is or is not required, but that's different from good practice.

I think it is good practice to honor a request to modify the attribution of a Creative Commons-licensed work in the credit line when the author wants it to be changed. One thing I also did in the pull request was to revise "Berkman Center for Internet & Society" to "Berkman Klein Center for Internet & Society" because the organization changed its name in 2016.

rcaden commented 1 year ago

The RSS Advisory Board copy of the specification also reflects the attribution change:

https://www.rssboard.org/rss-specification

dontcallmedom commented 1 year ago

I don't think there is anything wrong with the current document, but since I don't think there is anything that makes it critical for it to be hosted on the W3C feedvalidator service either, and to reduce the maintenance burden, I've submitted #109 that replace the content of the document with a link to https://www.rssboard.org/rss-specification

scripting commented 1 year ago

@dontcallmedom -- why not just point to the original?

https://cyber.harvard.edu/rss/rss.html

it really sucks to have something you worked so hard at for so long be defaced in this way.

dontcallmedom commented 1 year ago

the point is to link to the version of the specification on which (to the best of my knowledge) the validator was built

scripting commented 1 year ago

@dontcallmedom

RSS 2.0 is frozen. There are not "versions" of the spec. 

AJ-Ianozi commented 1 year ago

@dontcallmedom -- why not just point to the original?

https://cyber.harvard.edu/rss/rss.html

it really sucks to have something you worked so hard at for so long be defaced in this way.

I think this comes down to who W3C considers the authority for RSS2 today.

I was looking at the Change Log that RSS Board published, it looks like every update made to the spec since 2004 is included in the Harvard site (which lines up with the timeline of when Rogers mentioned you were in the advisory board). Every update since then has just been formatting to the documentation and wording rather than changes to the technical spec.

While from a technical standpoint they're identical, they do have some differences at this point, and if rssboard comes up with a new donations attribute for <item> and the Harvard site doesn't list it, which copy should W3C run their validator against?

scripting commented 1 year ago

@AJ-Ianozi --

first, thank you for looking at the change log. I hope you will keep doing that, and keep us informed if any lines get crossed.

second, i'm sure the "rss advisory board" know that they can create a new namespace as I have for all their innovations, and document it in the docs for the namespace. i've been adding items for years and years, and no harm has come to anyone. one of the elements in the source namespace seems to be on its way to becoming a standard. so we make progress with this framework.

the spec was 20 years old last year. it's survived this long without anyone adding random attributes to it. no regrets at all about choosing the course outlined in the roadmap. it worked. ;-)

here's the namespace I've been adding to...

http://source.scripting.com/

there's no prob

scripting commented 1 year ago

BTW, let's make sure everyone reading this knows what's in the Roadmap section.

RSS is by no means a perfect format, but it is very popular and widely supported. Having a settled spec is something RSS has needed for a long time. The purpose of this work is to help it become a unchanging thing, to foster growth in the market that is developing around it, and to clear the path for innovation in new syndication formats. Therefore, the RSS spec is, for all practical purposes, frozen at version 2.0.1. We anticipate possible 2.0.2 or 2.0.3 versions, etc. only for the purpose of clarifying the specification, not for adding new features to the format. Subsequent work should happen in modules, using namespaces, and in completely new syndication formats, with new names.

ttepasse commented 1 year ago

(Sorry for the digression, everyone!)

http://source.scripting.com/

Dave, that site didn’t specify a way for feedback, so I’m writing here, since I just seen it:

That document doesn’t specify an URI for the namespace.

As you may remember XML namespaces are not identified by the prefix of elements – in your examples source: – but by a globally unique URI to which a prefix is then bound to in the xmlns declaration. According to XML processing and implemented in all XML libraries I known, a namespaced element is uniquely identified by the combination of namespace URI and the element name, e.g. ("http://www.w3.org/2000/svg", "polygon") or ("http://www.w3.org/1999/xhtml", "details") and an XML processing program only should process these. Accordingly the prefix could be any string, if that string has been bound to the right namespace URI. E.g. in XML land source:markdown could be written quelle:markdown or fons:markdown if those prefixes had been bound to the right namespace URI in their respective XML documents.

You link to two example feeds and both feeds XML binds the source prefix – but they are using different namespace URIs, which for an XML processor means that the element prefixed with source are in different namespaces and are such counted as different elements.

scripting commented 1 year ago

@ttepasse -- do you have a suggestion?

ttepasse commented 1 year ago

Simply decide on one namespace URI for your namespace and write it first thing in the spec, so that people who want to process the elements in your spec know what to implement. Every other namespaced XML spec I know does it.

And of course use it yourself, where you generate the XML feeds and where you process them. Good example and all that.

I’d use http://source.scripting.com/ – I prefer simple URIs and the spec seems to have its permanent home there.

scripting commented 1 year ago

That’s exactly what I do.

You can see it in use in the feed for my blog.

http://scripting.com/rss.xml

ttepasse commented 1 year ago

But in your link blog feed you’re using a different namespace by binding the prefix source to the URI http://source.smallpict.com/2014/07/12/theSourceNamespace.html. Different URI means different namespace.

scripting commented 1 year ago

Let’s take this offlist, my email address is on the about page of my blog.

http://scripting.com/?tab=about

ttepasse commented 1 year ago

I couldn’t write any more in an email than here.

scripting commented 1 year ago

Okay -- i just don't know your email address, and this is so far off-topic, i don't want to keep this going here.

You didn't say the address of the feed you cited, so i can't investigate. In any case, when the url of the namespace changed it wasn't practical to find all the feeds that cited it and change them.

But please don't respond here, @ttepasse -- and this is basically a concluded discussion as far as I'm concerned, and it's way way off-topic.

But thanks for calling my attention to this.

scripting commented 1 year ago

back to the main thread. :smile:

i spent some time thinking and writing, and this is where i'm at.

http://scripting.com/2023/06/24/134722.html

AramZS commented 1 year ago

I do think that this question of which spec to link to can be answered by not pointing to another site @dontcallmedom ?

I don't think I see an objection here to the W3C hosting it, just to making sure attribution is correct. I am always hesitant to link to an offsite spec in this way in either location. Links rot[^1] and as nice as it would be to hope any of the three URLs noted here (or more!) will be on the web forever, I don't think it is good to depend on that?

All versions of the spec seem to be clear on the CC license. They are all Attribution/ShareAlike. That gives the W3C the permission to host it themselves. I think it is reasonable to do as @rcaden suggests and just update the W3C copy so it has correct credit in line with the other noted copies.

[^1]: EDIT: A good related example is that MediaRSS spec continues to be linked to at its old Yahoo URL on the top of many feeds, I see that link on podcast feeds all the time, even though Yahoo long ago let that page die and didn't even bother to forward it anywhere.

scripting commented 1 year ago

@AramZS -- this is the archive.org snapshot of the original spec on July 22, 2003.

https://web.archive.org/web/20030722225559/http://blogs.law.harvard.edu/tech/rss

rcaden commented 1 year ago
  1. EDIT: A good related example is that MediaRSS spec continues to be linked to at its old Yahoo URL on the top of many feeds, I see that link on podcast feeds all the time, even though Yahoo long ago let that page die and didn't even bother to forward it anywhere.

Yahoo transferred the Media RSS Specification to the RSS Advisory Board in 2009 and it is hosted here:

https://www.rssboard.org/media-rss

The transfer took around 21 months to arrange and was made possible by Sapna Chandiramani and Nilesh Gattani at Yahoo and Randy Charles Morin and Ryan Parman on the board.

P.s. The URI identifying a namespace is not supposed to change even when the documentation moves. Media RSS still uses "http://search.yahoo.com/mrss/" because that is what implementers of the namespace expect. If new RSS feeds used a different URI for Media RSS, existing implementations would not know that it's Media RSS.

scripting commented 1 year ago

@AramZS -- that is not the only or even main objection.

RSS 2.0 is frozen. There aren't versions of the spec. This seems to be a basic misunderstanding.

The roadmap is part of the specification for the format. If you claim to validate RSS feeds you have to keep your validator valid.

We should be watching that. Make sure they don't start saying feeds aren't valid RSS yet are following the RSS 2.0 spec.

And for that the only authoritative spec is the original one.

It's really simple, by design. Easy to read and imho very easy to understand.

scripting commented 1 year ago

BTW, I came here to post this note.

It was noted earlier in this thread that there was no place to post comments or questions about the Source namespace. So I created a place and linked to it from the spec.

I also figured out what @ttepasse meant by the "linkblog feed" and fixed it.

Since there's now a place to comment on the Source namespace, there's no need to continue discussing it here. :smile: