Closed wilkie closed 8 years ago
As we discussed at the f2f meeting, I added a section that mentions this: https://indiewebcamp.com/wiki/index.php?title=Webmention&diff=23679&oldid=23678 Would you please review and let me know if that addresses your concerns? I'm happy to take suggestions for alternate wording as well.
Yeah looks ok. redirects are handled in a different issue. this seems fine.
Right now I am concerned. If the webmention is a public endpoint that is unauthenticated, then the computation it performs should be bounded. That is, you should know when you get a webmention that the effort it would take to verify will be reasonably deterministic. If you have to perform a GET, and then you have to parse, then it seems to be unbounded (unless the size of the source document is capped. etc etc.)
So, webmentions can come from third parties (site C) to tell site A about a link from site B. But, a malicious third party can find some large source page on B and send a bunch of webmentions to A, causing a lot of wasted effort on A (and potentially bandwidth on B) Meaning, depending on how B mitigates: B could ban/block A. B could run out of bandwidth or be hit financially. Site A needs to block third-party C (but C can be any server anywhere, so it has so few burdens) Site A could run out of cpu resources or bandwidth. Site A could not be able to reasonably schedule other webmention verifications among other tasks it needs to do because it doesn't know how much effort a webmention takes. Site A could be hit financially. You can certainly nitpick these, but only one has to be reasonably true.
For verification to be reasonably bounded, you'd sign the message with site A or have some specified way of bounding the GET that is made and refusing to do anything else. That would require A to more actively participate in the protocol, but would limit the amount of data B needs to pull from A. There are already solutions found in other similar protocols. Salmon, for instance, signs the message and therefore assumes that only the origin requires verification, not the semantics of the message.
At any rate, the simplicity of the protocol makes it very fragile. Therefore, some discussion on the implications of verification and how to avoid such pitfalls would be beneficial. Such as: rejecting documents that are too large to verify, limiting redirects, checking for and refusing streamed data, caching popular documents.
(I think, as an implementer, I wouldn't recommend such a potentially open protocol be added to one's software. If it had easy reasonably bounded origin verification, then yes, which I would suggest as a MUST. And then I'd recommend that full-text verification be done only very optionally at idle times for servers I don't trust/know with a long set of restrictions such as those I listed. I'm still thinking about it a lot though. Full-text verification seems a bit too game-able to reliably enforce trust over time.)