Open snarfed opened 1 year ago
@snarfed might be perhaps an issue with the Mf2 parser, because it supports fragment-parsing.
@snarfed is the author outside of the fragment?
The source URL doesn't contain a fragment, it contains %23
, which happens to be an encoded #
character. I think the plugin(s) are decoding that part of the URL, but shouldn't be, since the form-encoded POST body shouldn't be URL-decoded. (I think?)
Ideally the plugins/parser would leave that %23
in the URL alone when fetching it and parsing mf2.
This is a really good question!
I would assume that they have to be URL encoded because otherwise an =
might be misinterpreted as param of the form.
And the content type is: application/x-www-form-urlencoded
so it literally mentions "urlencoded", but I will have a look at the spec.
From @sknebel in chat:
for keys and vaues, percent-encode everything "except the ASCII alphanumeric, U+002A (*), U+002D (-), U+002E (.), and U+005F (_). " HTML spec: https://url.spec.whatwg.org/#concept-urlencoded-serializer (and the quote specifically from https://url.spec.whatwg.org/#application-x-www-form-urlencoded-percent-encode-set )
I've confirmed that browsers URL-encode, so a form-encoded POST with key url
and value http://test/url%23fragment
results in the raw request body url=http%3A%2F%2Ftest%2Furl%2523
.
I've also confirmed that my code is doing the same thing, ie the #
is double-URL-encoded to %2523
, so the raw webmention POST body looks like:
source=https%3A%2F%2Ffed.brid.gy%2Frender%3Fid%3Dhttps%253A%252F%252Findieweb.social%252Fusers%252Fsnarfed%2523likes%252F709275&target=https%3A%2F%2Fsnarfed.org%2F2023-03-28_49662
Note the %2523
in the source
value. So @pfefferle you're absolutely right, the Webmention/Semantic Linkbacks plugins should URL-decode it once to get %23
, but I think not twice, which they seem to be doing right now?
OK, that might be possible because of the interaction of both (Webmention & SL) plugins, I will re-check the latest version of the Webmention plugin.
Looks like this isn't about the #
character at all. I added custom encoding for #
s, I'm now replacing them with ^^
, and I'm still hitting this problem. Here's an example source URL:
https://fed.brid.gy/render?id=https%3A%2F%2Ftechhub.social%2Fusers%2Fdiazona^^likes%2F979471
If I send a webmention with this source, I get:
{"code":"resource_not_found","message":"Resource not found","data":{"status":400}}
Same if I %-encode the ^^
, ie:
https://fed.brid.gy/render?id=https%3A%2F%2Ftechhub.social%2Fusers%2Fdiazona%5E%5Elikes%2F979471
However, if I double-encode those chars to %255E
to the source URL below, it works.
https://fed.brid.gy/render?id=https%3A%2F%2Ftechhub.social%2Fusers%2Fdiazona%255E%255Elikes%2F979471
Here are example WP debug logs I see for a failed webmention with a source URL with ^^
in it:
[25-May-2023 02:21:48 UTC] REST request: /webmention/1.0/endpoint: {"source":"https:\/\/fed.brid.gy\/convert\/activitypub\/webmention\/https:\/mastodon.social\/users\/notblanklikes\/88327162","target":"https:\/\/snarfed.org\/2023-05-24_50288"}(Header Present)
[25-May-2023 02:21:48 UTC] REST result: /webmention/1.0/endpoint: {"code":"source_error","message":"Bad Gateway","data":{"status":400}}(400) - [](User ID: 0)
The full source URL was https://fed.brid.gy/convert/activitypub/webmention/https:/mastodon.social/users/notblank^^likes/88327162
. Note that the logged source URL is missing the ^^
. I get the same logs if I URL-encode the ^^
to %5E%5E
.
Btw this is on pre-merge plugins, ie Webmention 4.0.9 and Semantic-Linkbacks 3.12.0.
Why do people put everything in URLs...??? (and please do not answer with: because they can ☺️ )
Hah, fair point, maybe I'm being a bit difficult here. Sorry! This bug does seem unrelated to any individual characters though, since it happens when they're URL-encoded too, eg the examples here with both %23
and %5E%5E
still break the plugin.
I'm open to other ideas! I need to be able to include arbitrary URLs, including ones with #
fragments, but I can encode them however works best for you all.
esc_url
, esc_url_raw
and sanitize_url
seems to remove the ^^
special chars. That is not really good, because these are highly recommended when dealing with URLs.
It is at least no double encoding or something similar.
Odd: I switched back from ^^
to %23
recently, and now I'm seeing some of these source URLs work after all. Example: https://ap.brid.gy/convert/web/https:/bayes.club/users/zerology%23likes/32983 on https://snarfed.org/2023-07-10_50589
@snarfed that make sense, because if you check the HTML of the fed.brid.gy links (vs the AP links), then you find only an h-card without any context... that's why the plugin ignores them, it does not know how to handle them...
Hmm! You're right about the top source URL in the original description, https://fed.brid.gy/render?id=https%3A%2F%2Findieweb.social%2Fusers%2Fsnarfed%23likes%2F709275 . Not sure what's going on there.
The rest of the source URLs here are valid u-like-of
s though, including the second one in the description, https://fed.brid.gy/render?id=https%3A%2F%2Findieweb.social%2Fusers%2Fsnarfed%2523likes%2F709275 .
Hi @dshanske @pfefferle! I'm seeing an odd issue with source URLs with URL-encoded
#
characters, eg https://fed.brid.gy/render?id=https%3A%2F%2Findieweb.social%2Fusers%2Fsnarfed%23likes%2F709275 . That page has au-like-of
with a fullp-author h-card
, with name and photo, but when WordPress receives it as a webmention source, Semantic-Linkbacks doesn't find that author at all.However, if I double-URL-encode the
#
character, ie https://fed.brid.gy/render?id=https%3A%2F%2Findieweb.social%2Fusers%2Fsnarfed%2523likes%2F709275 , the webmention works fine and correctly shows the author name and image.I know URLs with
#
s are awkward, even when URL-encoded, but the first source URL is working ok with other wm receivers, eg https://www.jvt.me/week-notes/2023/09/ (scroll down and expand Interactions with this post), so I suspect this is a bug in this plugin or Semantic-Linkbacks?Thanks in advance!