Closed kevinmarks closed 8 years ago
I don't see why the target is much use at all.
Here's a first cut at what I imagine webmention endpoints running:
result = urlfetch.fetch(mention.source)
if result.status_code == 200:
html = unicode(result.content,'utf-8')
links = parseForLinks(html)
for link in links:
if config.trackURL(link):
mentions.add( { source: mention.source, target: link, verified: True })
This relies on some parseForLinks function, which could just be a regexp but I hope we'll formally defined for each media type (another issue?), and a config.trackURL function which tells us whether a given URL is one this endpoint is supposed to be tracking. We need that anyway. (I assume your code has something like that, probably a line or two earlier.)
(This reminds me, I wanted to suggest an optional etag parameter in the webmention to avoid even needing the fetch at all if the source hasn't been modified. But I guess that's another thread. Or maybe it's in there already.)
The big upside of this is when a page has multiple links that result in sending webmentions to the same endpoint, they could be skipped. The only downside I see is that maybe people want the existence of a webmention to have semantics, and this would automatically create a bunch that maybe people don't want, but I haven't heard that use case.
For me that trackURL function is an expensive operation. You cannot tell immediately if a link is actually pointing to me without following it to see if it's a short link or redirect. Twitter wraps all their links in their own url shortener. How do I ever tell if twitter links to me? I would not want to have to follow every link on a page to make sure it doesn't redirect to me.
Etag sounds interesting. Certainly make a thread for that.
Good point @dissolve, I'd forgotten that aspect. So, yeah, the URL shortener use case makes sense as a good reason to allow a target parameter. I'm not sure it justifies making it mandatory, though, since it's just a performance issue.
It's actually more than just performance. It allows me to use anyy webmention endpoint to magnify a few requests to a ddos. I create a page with a couple thousand links to some poor souls website. Then I hit your endpoint without saying a target link. You just hit them with all those requests for me and I only had to do one. Huge magnification of the attack. Do that to multiple endpoints and its a pretty easy attack.
Target given in the webmention is not the link on your site, but the exact link I am posting. So you know the single link to follow, to see if it resolves to a page on your site.
Honestly I had forgotten this whole reasoning and just remembered it all now. Probably a good thing for an FAQ.
@dissolve Would that be addressed by saying the target MUST be provided in the case where there's a redirect (and is optional otherwise)? The service issuing the webmention will know if this is the case, since it had to dereference the target to find out the endpoint's address.
I ask mostly out of curiosity. If it's required in that case, it's probably no help to anyone to make it optional in other cases.
Huh. I hadn't thought about it in that way but yes I would say target is a MUST for cases where there is a redirect.
It's also pretty trivial to include since you have to know the target when you send the webmention. So it's not like it adds any work on the sender (other than the case of multiple mentions needing to be sent to one endpoint) and can make the receiver's job much easier.
The etag/Last-modified handling belongs in the webmention receiver. That's on my list as indiewebify.me would thrash my server if it pinged homebrew website club notes at the moment.
When a webmention (claim) is submitted, the verification process simply checks whether it holds, i.e., whether it can be found at the source - after all, we are told that the source is making a statement about the target, so the question is, does it hold? The current spec says:
The receiver SHOULD perform a HTTP GET request on source to confirm that it actually links to target (note that the receiver will need to check the Content-type of the entity returned by source to make sure it is a textual response).
The semantics of whether "it actually links to target" or not can be more clear. I think this is the point we need to clarify better in the spec - created issue #17 - and that will give us a better grounding for this issue.
If we look at the shortURL example 2 at https://github.com/w3c-social/webmention/issues/17#issue-119353004 , cases b and c essentially result in the same scenario. Therefore, unless the target receives a URL which can be found at the source URL - naturally omitting the follow-your-nose case - it makes no difference if the source is using a shortURL or not. Which is equivalent to not being provided with the target to begin with. Essentially there is no description or guidance on what to do with URLs which are not found at the source.
Indeed, we should probably specifically list that if the URL is not found you disregard the webmention. Implementers can of course do whatever they want but best to put that best practice would be to not waste time with a malformed webmention
Instead of saying what to keep and what not to keep you can speak about Truth conditions: What is it that makes the content of the post True. If the agent wants to keep false propositions, then that's up to it/him/her.
Actually what you are doing is explaining how to verify the truth of the statement made in the POST
. Verifiability and truth conditions are closely related of course, even if not as closely as A.J.Ayers believed in "Language, Truth and Logic" pubished in the 1930ies . This has a long history needless to say.
Yes Henry, and Gödel showed truth was undecidable by computers shortly afterwards.
Returning to empiricism, in practice we have services (eg http://brid.gy ) that remap twitter short URLs into webmentions by expanding them, and that in effect implement @sandhawke's trackURL
model by monitoring twitter broadly. So, like indiewebify.me, this is support infrastructure for the protocol rather than an implementation of it, which is a sign that as implemented it has useful boundary conditions.
@kevinmarks Tarksi, Gödel and other logicians are the bread and butter of philosophical thinking on meaning and truth. Just check Semantic Theory of Truth, or the Stanford encyclopedia of philosophy.
The protocols we are designing are not to make computers distinguish truth from falsety, but for us to be able to reason about them, in a court of law for example, or when building protocols, or software agents. Humans do that, and it helps to take into account the thinking on meaning that has developed throughout the 20th century, just like it helps when building skyscrapers to know about maths, material science, and many other subjects.
@bblfish @kevinmarks thats getting completely off topic
There are a few reasons why the target parameter is beneficial.
Looking at the implications for DoS attacks, without the target parameter, it becomes trivial to cause a webmention receiver to do unnecessary work of verifying invalid webmentions (webmentions that don't link to the target). At least when the target is required, an attacker has to customize the request per victim.
Without the target parameter, the webmention payload becomes ambiguous, since the source likely links to more than one target. Which target is the sender interested in for a given webmention request then? What kind of error response should the receiver return if it supports receiving mentions for more than one link on the page?
I am glad to see the related issue #17 come out of this thread.
I'm going to close this issue as I don't think anyone was actually advocating for dropping the "target" parameter, but it was an interesting thought experiment.
@kevinmarks please comment here if you are satisfied or unsatisfied with the result of this discussion
I'm satisfied that we're keeping target as required.
Forking this off from #1 as they are independent issues.
Currently the spec says that the target parameter is required, so that a minimal webmention parser can just check that the URL is in the source document. My naïve example:
https://github.com/kevinmarks/mentiontech/blob/master/main.py#L119
Clearly, actually parsing the source document for actual links would be an enhancement here.
in https://github.com/w3c-social/webmention/issues/1#issuecomment-159755367 @csarven says
This is true in the specific case that the webmention endpoint is tightly coupled to a particular domain, and thus can know a priori which links are within its purview. That is a common case for webmention, but it is not the only possible case, as webmention receivers can support mutiple target sites,
There is another case where only a source can work - if you are sending webmentions on behalf of a page. indiewebify.me does this. However this is more of a webmention supporting service than an implementation of the protocol (it accepts a
url
parameter, notsource
)Further comments from #1: @rhiaro:
@dissolve:
@csarven:
I hope I have captured everyone's arguments on this point. If not, please comment below.