superseriousbusiness / gotosocial

Fast, fun, small ActivityPub server.
https://docs.gotosocial.org
GNU Affero General Public License v3.0
3.76k stars 322 forks source link

[feature] Allow `web+ap://` links through the HTML sanitizer, handle search for `web+ap://` links at `/api/{version}/search` #2059

Open mirabilos opened 1 year ago

mirabilos commented 1 year ago

Is your feature request related to a problem ?

I’ve been frustrated by people trying to use Fediverse links with their webbrowsers, just because they start with https:// because They didn’t use an extra præfix or separate scheme for them when They invented AP.

Apparently, web+ap:// is the new hotness; FediText already supports it. However, trying to post…

web+ap://toot.mirbsd.org/@mirabilos/statuses/01H6VF1VAKJQSB6B874GHCP0PQ

<web+ap://toot.mirbsd.org/@mirabilos/statuses/01H6VF1VAKJQSB6B874GHCP0PQ>

[`web+ap://toot.mirbsd.org/@mirabilos/statuses/01H6VF1VAKJQSB6B874GHCP0PQ`](web+ap://toot.mirbsd.org/@mirabilos/statuses/01H6VF1VAKJQSB6B874GHCP0PQ)

… results in sadness (with either Semaphore or FediText):

             id             |                                                                                                                                                                                                          content                                                                                                                                                                                                           
----------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 01H6VKJDZ49D556NEV9WKDP0DM | <p>web+ap://toot.mirbsd.org/<span class="h-card"><a href="https://toot.mirbsd.org/@mirabilos" class="u-url mention" rel="nofollow noreferrer noopener" target="_blank">@<span>mirabilos</span></a></span>/statuses/01H6VF1VAKJQSB6B874GHCP0PQ</p><p>web+ap://toot.mirbsd.org/@mirabilos/statuses/01H6VF1VAKJQSB6B874GHCP0PQ</p><p><code>web+ap://toot.mirbsd.org/@mirabilos/statuses/01H6VF1VAKJQSB6B874GHCP0PQ</code></p>
(1 row)

Describe the solution you'd like.

Support web+ap:// links as first-class citizens.

Describe alternatives you've considered.

NONE

Additional context.

No response

mirabilos commented 1 year ago

Related FediText commit (and I hope I understood what it does correctly): https://github.com/feditext/feditext/commit/3373f6ef6682617ed49ee2640b8c11eb678b4b18

tsmethurst commented 1 year ago

It's not clear what you mean by 'support [this thing]'. Support how? Is there a link to more documentation somewhere?

mirabilos commented 1 year ago

tobi dixit:

It's not clear what you mean by 'support [this thing]'. Support how?

Allow people to post web+ap://… links as links.

Is there a link to more documentation somewhere?

No idea, I only know it from FediText and the proposal from https://portend.place/objects/f1016527-1acc-4045-a138-285eb4fa35d7 which probably should be web+ap://portend.place/objects/f1016527-1acc-4045-a138-285eb4fa35d7 so people know to not click it in a browser.

IIUC it just says to s/web+ap/https/ but open in a Fedi client.

bye, //mirabilos -- "Using Lynx is like wearing a really good pair of shades: cuts out the glare and harmful UV (ultra-vanity), and you feel so-o-o COOL." -- Henry Nelson, March 1999

tsmethurst commented 1 year ago

Hmm okay I see, related to https://github.com/superseriousbusiness/gotosocial/issues/1768

mirabilos commented 1 year ago

tobi dixit:

Hmm okay I see, related to https://github.com/superseriousbusiness/gotosocial/issues/1768

Yeah, basically a batter way to do that by changing the scheme part of the URI instead of requiring link annotation. (It will of course take some time until this can actually be used by default.)

I found¹ the suggester² patched³ their Akkoma do “do” that, but in GtS we don’t have the entangling with a “house client” (which is good) so I think outbound and inbound support for links with that scheme may be sufficient.

https://fedi.software/notes/9hqffoc2elrwrbdzhttps://portend.place/objects/f1016527-1acc-4045-a138-285eb4fa35d7https://portend.place/objects/ad803022-f201-4b39-a233-128a8f41f44b

bye, //mirabilos -- 08:05⎜<XTaran:#grml> mika: Does grml have an tool to read Apple ⎜ System Log (asl) files? :) 08:08⎜<ft:#grml> yeah. /bin/rm. ;) 08:09⎜<mrud:#grml> hexdump -C 08:31⎜<XTaran:#grml> ft, mrud: g

SoniEx2 commented 1 year ago

the main things GtS can do here are:

feel free to come by #fedilinks at libera.chat as well

VyrCossont commented 1 year ago

@tsmethurst From the client perspective, what I'd like to see is either that web+ap URLs get allowed through the GtS HTML sanitizer (preferred, clients can then handle them specially) or that they get rewritten into their https equivalents (compatible with clients that don't understand web+ap, but loses information, specifically, that the thing linked to is an ActivityPub actor or activity that can be resolved with a search?resolve=true call).

Feditext supports web+ap links by registering as a handler for them, so if a user clicks one on a web page or in another app, it'll open in Feditext.

daenney commented 1 year ago

I think allowing web+ap:// through the sanitizer shouldn't be a problem.

What I'm not so sure about is propagating/using web+ap:// links. There's two possible issues here, and I'm not sure how to solve it:

I don't see an obvious way to gracefully support this in a way that we don't leave a lot of people behind. So even if we allow it through the sanitiser, it seems like we would have to rewrite those links on our side?

mirabilos commented 1 year ago

Daenney dixit:

What I'm not so sure about is propagating/using web+ap:// links.

Yeah, that one is clearly in “not yet” territorium.

Handling it in searches is needed, though, so the clients can actually begin to support it (for the copy/paste into search box manually use case at least).

bye, //mirabilos --

Hi, does anyone sell openbsd stickers by themselves and not packaged with other products? No, the only way I've seen them sold is for $40 with a free OpenBSD CD. -- Haroon Khalid and Steve Shockley in gmane.os.openbsd.misc

SoniEx2 commented 1 year ago

we do think the web UI should have the web+ap post/user link somewhere. this could be done with a new button. rewriting them as they come in is :woman_shrugging: but may be a good temporary solution as clients slowly add support.

but at the same time, not rewriting them is good because clients can detect them and open them in-app instead of having to pass every clicked link back to the server for detection.

because, yes, some clients will leak every link you click back to your mastodon server. for the sake of "fedi link detection". privacy is dead and they killed it.

VyrCossont commented 1 year ago

I don't see an obvious way to gracefully support this in a way that we don't leave a lot of people behind. So even if we allow it through the sanitiser, it seems like we would have to rewrite those links on our side?

Without a client/server extensions negotiation mechanism, yeah, and I'm not sure we want to reinvent that much of email. #1768 mentioned using <a type="…"> as a compatible hint to the client as to what we expect on the other end of the link. Suppose we rewrite web+ap:// links to https:// links rendered using <a type="application/activity+json" href="…"> tags?

I'm ignoring implementation complexity up to this point, though. Guessing BlueMonday's sanitizer isn't built for transforms that specific.

VyrCossont commented 1 year ago

@mirabilos unrelated to the bug but if you're going to insist on using the GitHub email gateway, please turn off your signature. It's confusing to have several lines of completely unrelated text tacked onto the end of every single one of your responses.

daenney commented 1 year ago

Without a client/server extensions negotiation mechanism, yeah, and I'm not sure we want to reinvent that much of email. #1768 mentioned using <a type="…"> as a compatible hint to the client as to what we expect on the other end of the link. Suppose we rewrite web+ap:// links to https:// links rendered using <a type="application/activity+json" href="…"> tags?

Hmmmm, that's an interesting idea. I don't think that would break anything, I'm fairly certain browsers ignore that entirely. Something to test, but if that works I think that's probably our way forward.

tsmethurst commented 1 year ago

I'd be down for using 'type' for hinting, it's basically what I suggested here anyways: https://github.com/superseriousbusiness/gotosocial/issues/1768 ;) (NOT TO BLOW MY OWN HORN OR ANYTHING BUT PARP PARP)

I have some concerns when it comes to doing that for incoming a elements though. Our heuristics for determining whether to add that type tag (or indeed to rewrite the link to web+ap://) would look something like:

  1. Check every link in a new post that comes in by doing a SELECT in our database for that link as a URI, and that link as a URL, to see if it matches a status or account that we know about already.
  2. If not, then check each link by doing an HTTP call with Accept: application/activity+json or (more properly) Accept: application/ld+json; profile="https://www.w3.org/ns/activitystreams" in order to determine if it's a profile/status/whatever link or not.

This could lead to a large increase in the amount of both database and HTTP calls that your instance has to do: 2 extra db calls per link, and 1 extra HTTP call. If someone posts a status that's like 'here's a list of interesting links' and it has 10 links in it, that kinda sucks for us.

So I think for a first implementation, it's probably better to only add these type hints on outgoing statuses created on the instance. Then we can at least get a feel for whether that's useful for clients. If it does actually turn out to be useful, we could perhaps add a config option (default false) so that instance admins can choose to have their instance try to determine the type of incoming links as well.

Another thing that occurs to me is that using type on links doesn't really offer any hints to the client about what the ActivityStreams type of the Object is: is it a Person or a Note, or something else? I guess it doesn't matter too much since all it's doing is instructing clients 'put this link in the instance's search bar, it's something significant', but it would be cool to be able to include it somehow.

tsmethurst commented 1 year ago

Also just to add, I'm not really that bothered whether we use web+ap:// or link type hinting or something else to do this, since we don't really have a horse in this race at the SuperSeriousBusiness corporate towers.

The important thing from my perspective is that we just try to support something that's durable, and not put too much work into a solution that nobody else implements, or which ends up getting deprecated after a few months. In my view, using existing HTML properties is probably better for this than trying to shoehorn in a new protocol, but on the other hand if enough other fedi implementations go for web+ap:// then that's likely to become the standard. So it's a sort of 'let's dip our toes in and start thinking about it, but also wait and see' kind of situation, for me.

SoniEx2 commented 1 year ago

Another thing that occurs to me is that using type on links doesn't really offer any hints to the client about what the ActivityStreams type of the Object is: is it a Person or a Note, or something else? I guess it doesn't matter too much since all it's doing is instructing clients 'put this link in the instance's search bar, it's something significant', but it would be cool to be able to include it somehow.

web+ap supports this:

web+ap://type:Note@instance.example/id

tsmethurst commented 1 year ago

Also, can someone link to some actual protocol document for web+ap:// and not just fedi posts and whatnot? It's a bit confusing to have this discussion about the relative merits of each method when I only have a vague idea of what web+ap can actually do.

mirabilos commented 1 year ago

tobi dixit:

a few months. In my view, using existing HTML properties is probably better for this than trying to shoehorn in a new protocol, but on the

OK, then let me try to make my personal use case / what I am actually interested in more clear.

I tend to share Fedi links with other people occasionally, sometimes via Fediverse, sometimes using other methods of contact like Jabber.

Currently, Fedi links begin with “https://” which makes people think they can use a webbrowser on them, which, in the case of unlisted GtS posts, will not work for known reasons. The correct way to deal with Fedi links is to paste them into the Search field of one’s Fedi client so the post can be loaded by one’s own instance and displayed by the client.

The same problem occurs when people link to Fedi posts in statuses (including the “quote toot” feature some instances have, but not limited to them). Much too often, clicking/tapping those links opens up the target in a webbrowser, which has multiple downsides:

• the webbrowser has to load the HTML+CSS+JS of the foreign instance, which is a PITA on German mobile networks at least (slow and uses up too much data from the monthly contingent)

• makes you navigate the post/thread using a foreign instance’s builtin public anonymous webclient, which tends to break using the cursor ↑/↓ keys to scroll through the site, and in general looks ugly

• by viewing them on a foreign instance’s anonymous client instead of your own, you cannot interact with those posts

• so you end up copying the URL from the browser (if it even lets you) and paste it into the search form anyway (can’t right-click/long-tap on links in FediText to copy them, it only supports clicking/tapping them to load them with some application)

If Fedi links did not use “https://” but something like “fedi://” or “fedi:https://” these problems would not happen because then the links clearly are not meant for the webbrowser.

That’s my “problem description” before I discovered that…

other hand if enough other fedi implementations go for web+ap:// then

… FediText implements something called “web+ap://” which seems to be precisely the thing I had envisioned I’d need.

With “web+ap://” support, I will eventually be able to copy a link to a post from my own client, paste it somewhere, and either it already has “web+ap://” at the front (but not for a long time, until enough systems will support that) or I can change the https to web+ap, then I can send it off.

Currently, however, I cannot use text or <web+ap://…> or the unadorned form (no angle brackets) to even post such a thing as hyperlink in GotoSocial, because the sanitiser considers it not a link. So that’s the thing I reported needs fixing.

The second thing that’s also “immediately” needed is the ability to search for web+ap:// links when pasting them into the search field of Semaphore. (I assume using them with FediText will work the same way? Or does it convert to https first?)

Everything else can wait. Please do not automatically rewrite URLs anywhere, do not automatically contact remote servers when encountering an URL, etc. and the post URL is normally also just exposed from clients so, at some point, it’s Semaphore that will switch to web+ap:// links in the @.*** of the date-and-time of the post in the post detail view.

That’s my 2¢, //mirabilos

SoniEx2 commented 1 year ago

we also note that web+ap exists as a compromise between the needs of fedi user and browser support. it is barely supported, but the support that is there is enough to make it worth considering.

it is entirely a "what can we do today" solution. if you want a "proper" solution: make a new kind of link entirely. maybe a web: link, define a namespace for it (with the IANA), give it a default fallback, get OSes to add support for it, get browsers to add support for it, and so on.

or use web+ap which we can make do with. real problems require real solutions, not ideal ones.

tsmethurst commented 1 year ago

@mirabilos right that makes sense, so to sum up: allow web+ap through the sanitizer, and handle web+ap in the URIs of search queries, right? That seems reasonable and something we can do. Do you mind if I change the title of the issue to reflect that that's what's being requested?

I'd still like to see some actual document for this thing, like, something on a website that I can just go read to understand it. As a treat.

SoniEx2 commented 1 year ago

something like this? https://github.com/fedi-to/fedi-to.github.io/blob/main/webap.md

mirabilos commented 1 year ago

tobi dixit:

@mirabilos right that makes sense, so to sum up: allow web+ap through @the sanitizer, and handle web+ap in the URIs of search queries, right?

That’s the “as far as I understand it” part, because…

I'd still like to see some actual document for this thing, like, something on a website that I can just go read to understand it. As a treat.

… I haven’t seen anything yet either. But I guess we’ll see if/when there will be more to do.

@That seems reasonable and something we can do. Do you mind if I change @the title of the issue to reflect that that's what's being requested?

Sure ☻ No complaints, the issue titles are for the developers to read.

Thanks, //mirabilos

SoniEx2 commented 1 year ago

so, recently (about 2 weeks ago) mastodon did complete changes to the remote interaction dialog.

it doesn't help GtS users tho, because it relies on sending the user to a web client (aka mastodon instance). so we devised a workaround which is compatible with Tokodon and FediText.

but we don't think GtS should have a remote interaction screen. we think GtS should just show a web+ap link somewhere. see also the discussion here about what it might look like: https://portend.place/notice/AY4Vp3CkSDhyCYqCga

mirabilos commented 1 year ago

Soni L. dixit:

so, recently (about 2 weeks ago) mastodon did complete changes to the remote interaction dialog.

it doesn't help GtS users tho, because it relies on sending the user to a web client

The problem is ending up on these web clients in the first place. They’re awful to use (compared with e.g. Semaphore) and load tons of resources, some (like floss.social) even illegally from third parties without first obtaining user consent. Mastodon is going to sell users’ PII to Cloudflare/hCaptcha soon as well, apparently.

but we don't think GtS should have a remote interaction screen.

Agreed.

bye, //mirabilos

tsmethurst commented 1 year ago

Can we cool it with the hyperbole and stuff in this thread, please, and just focus on the issue at hand? Derailing the discussion by vaguely gesturing at mastodon 'selling PII' (what?) is not helpful or appreciated.

SoniEx2 commented 1 year ago

additional context: https://www.youtube.com/watch?v=y7RaPyN9-Gc

SoniEx2 commented 1 year ago

further context: https://fedilinks.org/2

mirabilos commented 1 year ago

@SoniEx2 hm.

The Well-Known Protocol Handler SHALL be available

But what if there’s no http-reachable redirection it could ever possible point to?

In GtS, we won’t need such a protocol handler at all, because we’re not going to have something to redirect to.

SoniEx2 commented 1 year ago

sorry but GtS does serve text/html for some posts, in addition to application/activity+json. if you really don't have something to redirect to, then please explain why you maintain text/html routes.

please either deprecate the text/html routes entirely, or redirect to them.

consider also this WKPH implementation, which is not intended for external consumption: https://soniex2.autistic.space/.well-known/protocol-handler?target=web%2Bfeed%3A%2F%2Fsoniex2.autistic.space%2Fmicroblog.atom

(it is intended for use purely to resolve web+feed URIs pointing to soniex2.autistic.space. importantly, the CSP doesn't even allow cross-origin fetch.)

mirabilos commented 1 year ago

Soni L. dixit:

sorry but GtS does serve text/html for some posts, in addition to application/activity+json. if you really don't have something to redirect to, then please explain why you maintain text/html routes.

I’m not GtS, I’m just a user.

But: GtS only serves public primary posts and, IIUC, public replies to local public primary posts.

So, a majority of posts will not be served publicly from GtS, for privacy reasons.

This is why I want to use web+ap:// in the first place: because the posts are not served for the webbrowser.

bye, //mirabilos

mirabilos commented 1 year ago

Soni L. dixit:

please either deprecate the text/html routes entirely, or redirect to them.

I mean it’s not as if e.g. Mstdn serves all* posts publicly either. This request isn’t appropriate.

bye, //mirabilos

tsmethurst commented 1 year ago

Locking this now because it's clear what's being requested, links have been provided, and subsequent discussion just keeps being a mess that doesn't add anything useful.