Closed hvdsomp closed 8 years ago
Extending this to PDF is challenging, yes. Many publications of this nature do have HTML version too, though they are still under access control. For example http://onlinelibrary.wiley.com/doi/10.1002/asi.21571/abstract (which I found by searching google scholar for webmention) has HTML versions with controlled access.
If you are part of an institution that has gateway access to these kinds of publication, you could run the webmention verifier on a server that can use the institutional proxy, and verify the HTML versions of the documents that way.
The previous work here may be useful for you: http://lombardpress.org/2016/04/16/iiif-webmentions/
My comment was about (scholarly) publisher-to-publisher use of Webmention. An institutional subscription has nothing to do with the problem I describe. This is not about a user having access to a paper or not. This is about the receiving publisher not having access to sender publisher content. The PDF issue is kind of secondary in the problem I describe. The core issue is the paywall.
The requirement for an HTTP GET would just mean that the publisher of the webmention would need to have some page which is not behind a paywall that could simply list the datasets it references. A simple abstract or bibliography page could easily be made available in html without the PDF content. If memory serves correctly, many already do that.
@hvdsomp "core issue is the paywall" - this is an astute observation, and not unique to webmention. Paywalls break all sorts of Web Architecture. Hyperlinks, img src, script src, style sheets, iframes, pretty much all web hypertext / hypermedia. I suggest you consider raising this as an issue with the W3C TAG (@w3ctag), something like "Paywalls break web architecture, what is to be done about this?" should kick-off a good discussion. Perhaps you can convince the Web Payments WG to take on "pay walls" as a use-case as well. It's definitely a reality of current (attempted) use of the web, and something worthy of further cross-group discussion.
Hey tantek, that's one big can of worms you suggest I open ;-) The thing is, I have been an Open Access advocate from the early days of the movement. So, I don't want paywalls in scholarly communication. And I definitely don't want to spend time in that wormhole. But paywalls are a fact of life and I would love to work towards establishing increased web-centric interoperability for scholarly communication (e.g. using Webmention) in the current environment. Hence my "trust" suggestion as an alternative to an actual HTTP GET on the source.
Hence a suggestion to consider an additional aspect regarding "Webmention verification", which could be along these lines "if the receiver has a trust relationship with the sender, verification is optional".
I actually +1 this general idea, but having the 'trust relationship' out of band is kind of awkward. ~Or maybe not? As how to do verification is out of scope of the webmention spec, and actually up to the receiver, you could choose to "do verification" by consulting an internal list of domains you trust.~
ActivityPub's method of doing notifications using the ActivityStreams2 vocabulary (summarised here) allows you to include an authentication token of some kind (to be determined I think) in the payload with the notification, so you might not need to GET to verify based on that. It also lets you send more than just the source and the target as part of the notification, so if you can't GET it and you do trust the source, you can take that data at face value and use it to decide how to display it (or whatever else you might want to do with it).
The paywalls I regularly deal with "only" hide the main content but expose things like citation lists -> as long as those are properly linked Webmentions between the paywall pages via the doi-links could work.
Also, paywall-pages are a similar issue to silo pages like Twitter in that they hide content and don't include nice markup. For those there are services like https://brid.gy/ that provide nicely formatted metadata for Webmention endpoints. Something similar could be made paywalls as well, but that is a stop-gap solution that requires extra work. (And requires extra trust relationships to those services as well, but no explicit authentication)
how to do verification is out of scope of the webmention spec
That's incorrect. The spec describes specifically how to verify the link for HTML and JSON documents here: https://www.w3.org/TR/webmention/#webmention-verification
I think @dissolve's suggestion is on the right track. If the publishers show the list of other articles they mention on the "landing page", then normal webmention verification will work.
Hi sknebel, yes, some paywalls provide eg reference lists for free, others don't, yet others don't even provide a metadata record describing the content itself without providing credentials. Lots of variations on the theme.
As suggested by dissolve and you, I agree that things could be done to make webmention work even for paywalled environments but that would require extra effort beyond just the implementation of webmention, eg exposing special-purpose resources. I am not feeling it ...
@aaronpk Sorry, I guess I was thinking about verification beyond string-matching (deciding whether it's a type of mention you're interested in) that's out of scope, but you're right the string matching is actually that matters in this case.
I could see sending some sort of auth token along with webmention being an extension to webmention. But the fact that webmentions can normally be sent by anyone means it has to be an auth token, not just 'trusting' / whitelisting some other location.
Yes it would require some extra work, but not much. Especially since, in your example case, there is already a landing page which is not actually the PDF. since the landing page is where the source resolves to, that is where the verification would have to be done, not in the PDF thats behind a firewall.
Yeah you'd have to do this as an authenticated request, since any server can post a source and target to you. You can't simply write a rule that trusts a source domain, since any sender could send webmentions with that source domain. I also doubt you want to set up your system to "trust" webmentions sent from a specific IP address.
We actually already added a little note hinting at the potential for authenticated webmention requests here: https://www.w3.org/TR/webmention/#cross-site-request-forgery
Bridgy really is the best analogy here. You could construct an equivalent service that has authorization to see the papers, and makes proxy public Webpages that marks them up with citation links. OpenLibrary, Google scholar could make these. Then you are deciding to trust that service's mapping, not the webmention sender.
On Wed, 25 May 2016, 11:27 Aaron Parecki, notifications@github.com wrote:
Yeah you'd have to do this as an authenticated request, since any server can post a source and target to you. You can't simply write a rule that trusts a source domain, since any sender could send webmentions with that source domain. I also doubt you want to set up your system to "trust" webmentions sent from a specific IP address.
We actually already added a little note hinting at the potential for authenticated webmention requests here: https://www.w3.org/TR/webmention/#cross-site-request-forgery
— You are receiving this because you commented.
Reply to this email directly or view it on GitHub https://github.com/aaronpk/webmention/issues/42#issuecomment-221663848
First off, I think the specific proposal ("if the receiver has a trust relationship with the sender, verification is optional") is okay. I also think it's redundant. Specifications exist in lieu of special agreements between parties. If two parties agree, they are always free to vary a protocol in the privacy of their own data streams. That's just how standards work. It's similar to how specifications say how parties have to behave, but not how they have to implement that behavior.
Second, on the issue of authentication, as we move to an HTTPS-everywhere Web, I wonder if we can't say that if a webmention is performed using the TLS certificate of the source, it need not be verified. I'm 75% sure TLS can be used that way. In some quick searching I was unable to find any reports of it being done, though.
Finally, on the specific use case, it really seems best to point out the advantages of having a public landing page and having that public landing page include the references, with links. By doing that, and supporting webmention, sites will not only provide a better service to end users, but increase pagerank, increase traffic, and draw in customers. The first landing page I tried almost worked, except the references were fetched via ajax, so they don't occur in the HTML by default. There's a flag to fix that ("show on one page"), so http://dl.acm.org/citation.cfm?id=383071&preflayout=flat should work fine with webmention.
Thanks for the feedback, sandhawke.
Regarding (1), things are a bit more complex:
Regarding (2): I can't really comment on your proposal. But I do know from implementing HTTPS for some Memento "Web Time Travel" services (actually in the context of a collaboration with the W3C) that it's a rather messy endeavor.
Regarding (3): Obviously I agree with all the benefits you mention regarding implementation of open landing pages with references, etc. It's just that many publisher will not be convinced. Reality. As I mentioned, there's even publishers that don't allow downloading an eg BibTex record describing a paper without the required credentials. Regarding your ACM example: it worked but it actually didn't work, right? And there's many many more publishers out there, of course. Bottom line: many proposals I have seen in this thread require small or big technical and conceptual changes to publisher platforms in order for them to be able to implement the very simple and very useful Webmention protocol. Let's just say I am utterly skeptical ...
I was thinking more about this last night. What I would point out is that, there is not requirement that the HTTP GET request on source cannot have an auth token or other such data. Indeed this will be needed when doing any sort of private webmentions.
Perhaps a note in the text specifically calling out that the specifics of that GET are not defined in the webmention spec and may include additional auth mechanisms, etc.
@hvdsomp I just realised you may not have got my reference to Bridgy. It's this site: https://brid.gy/ What you do is authenticate with your silo credentials, and then it will map the proprietary APIs and formats into HTML and webmention you with them. So it will map a tweet like this:
https://twitter.com/jlew8/status/735449055485165568
into this
https://brid-gy.appspot.com/post/twitter/kevinmarks/735449055485165568
which can then be parsed and added to the orignal post:
http://known.kevinmarks.com/2016/according-to-api-docs-you-cant-edit-people-out-of
This approach could work for the academic citation case, if you can create 'library cards' for the papers with abstract and references that send the webmentions.
@hvdsomp Sure, I share your scepticism about publishers really participating in the Web. But what else can we do, but make it as easy and rewarding as possible for them, and as far as I can see the current Webmention spec does that. Maybe it could be explained in a way that would resonate with them more, perhaps as a use case in the spec? Like, have a use case that is landing pages for scientific publications, where a paper can learn about citations to it via webmention? I don't think any normative changes would help these folks, though, as I understand the problem.
sandhawke: a "landing page" use case might indeed be a good idea. along the lines of: increase inlinks to your landing page by: (1) making references available in the landing page (2) sending webmentions to referenced papers (3) have referenced papers link to your landing page
We discussed this during the f2f meeting and agreed to add a section describing the "landing page" use case. https://www.w3.org/wiki/Socialwg/2016-06-07-minutes#webmention-42-resolution
Thanks! I'm very happy with that resolution.
Greetings
Herbert
On Jun 20, 2016, at 22:39, Aaron Parecki notifications@github.com wrote:
We discussed this during the f2f meeting and agreed to add a section describing the "landing page" use case. https://www.w3.org/wiki/Socialwg/2016-06-07-minutes#webmention-42-resolution
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
The requirement to do an HTTP GET on the source and to verify whether it indeed references the target excludes important use cases, for example in web-based scholarly communication. I will explain by means of a very hot topic: linking publications with datasets. Other cases exist.
For publication/dataset linking, the publication (source) would use Webmention to inform the dataset (target) that it is being referenced in the paper. Typically:
It would be very hard (or even impossible) to perform "Webmention verification" as described in the spec because:
Even if one were to use the URIs of the actual content (PDF file, dataset) instead of the DOIs as source/target URIs, two of the above problems would remain.
I very much understand that this problem is to a large extent related to the fact that web-based scholarly communication does not necessarily operate in a manner that aligns very well with the way other pockets of the web do. Then again, I assume paywalls and landing pages exist beyond scholarly communication. And, most importantly, I would love if webmention could be used in scholarly communication, see eg slides 45-52 of [http://www.slideshare.net/hvdsomp/reminiscing-about-interoperability].
Hence a suggestion to consider an additional aspect regarding "Webmention verification", which could be along these lines "if the receiver has a trust relationship with the sender, verification is optional".
Cheers
Herbert Van de Sompel Los Alamos National Laboratory