snarfed / bridgy-fed

🌉 A bridge between decentralized social network protocols
https://fed.brid.gy/
Creative Commons Zero v1.0 Universal
486 stars 28 forks source link

Webfinger: return 404 on invalid domain #1115

Closed TomCasavant closed 1 week ago

TomCasavant commented 3 weeks ago

It looks like FediTest ran 17 tests against a bunch of different activitypub software for webfinger-spec compliance : https://feditest.org/blog/2024-06-05-early-results-webfinger/

Matrix of Test Results Detailed Test Results

There's a couple issues in there but, just from glancing through it, the main one might be bridgy-fed returning a 504 for missing resources (webfinger.server.4_2__5_status_404_for_nonexisting_resources::status_404_for_nonexisting_resources)

snarfed commented 3 weeks ago

Thanks! I've seen these, I love what @jernst and @steve-bate have been doing. FediTest is awesome! And they've put a ton of hard work into it.

On this specific result, Steve and I cordially disagree on which HTTP status codes are appropriate for Webfinger requests and for a bridge service like BF. Most webfinger servers are authoritative and more or less self contained, but BF isn't. It fetches external data, eg web pages and Bluesky profiles, and converts those to webfinger responses. Notably, it does that for initial account creation, not just for existing accounts. If you want to bridge a web site into the fediverse, you can do that just by searching for @[domain]@web.brid.gy. That initial lookup triggers BF to fetch the web site, generate a new AP actor for it, and start bridging it.

In general, if you need to make an external fetch to serve an HTTP request, and that fetch fails, the right status code to to return is 502 or 504. We should definitely add a special case for when the domain itself is invalid, eg its TLD doesn't exist! I'm happy for this issue to track that. Otherwise...🤷

snarfed commented 3 weeks ago

The relevant part of the Webfinger spec is in section 4.2:

If the "resource" parameter is a value for which the server has no information, the server MUST indicate that it was unable to match the request as per Section 10.4.5 of RFC 2616.

The interpretation of "has no information" is probably the sticking point here.

jernst commented 3 weeks ago

My view: it's actually the WebFinger spec's fault :-) They probably didn't think of gateways and so they said MUST 404. If they had thought of them, they probably would have said something more flexible.

snarfed commented 3 weeks ago

Maybe! But arguably gateways can always determine some information from the attempted fetch, even if it fails, so that section may not apply at all.

(...or maybe I'm trying too hard to weasel out of this. I don't know. 😁)

TomCasavant commented 3 weeks ago

Ooh yeah I guess that makes sense, I was picturing it more as it tried to access the bridgyfed user information and it failed rather than it trying and failing to access the external server in order to create the user

steve-bate commented 3 weeks ago

On this specific result, Steve and I cordially disagree on which HTTP status codes are appropriate for Webfinger requests and for a bridge service like BF ... We should definitely add a special case for when the domain itself is invalid, eg its TLD doesn't exist! I'm happy for this issue to track that.

Just to be clear, this is the only case I've been focused on, even though that might not have been clear at the start of the conversation (I accept the blame for that). The scenario I was describing involved an invalid domain name.

The general discussion about WebFinger acting as a gateway is interesting for other reasons, but practically speaking, most clients will probably only check for a 200 or 3xx and treat everything else as equivalent to a "not found". I'd be very surprised if there are any WebFinger clients that will retry a 502/504 after some delay, but returning that code at least gives a developer the option of doing that.

snarfed commented 3 weeks ago

Just to be clear, this is the only case I've been focused on, even though that might not have been clear at the start of the conversation (I accept the blame for that). The scenario I was describing involved an invalid domain name.

Oh, OK! Sounds like we're on the same page overall then. Thanks for clarifying!

snarfed commented 1 week ago

Done! Webfinger requests like https://fed.brid.gy/.well-known/webfinger?resource=acct:does.not.exist@web.brid.gy now return HTTP 400.

steve-bate commented 1 week ago

Great! Just curious... why did you decide on HTTP 400 vs 404 (since the request isn't malformed, just not a valid domain)?

snarfed commented 1 week ago

No good reason! If the domain is invalid, arguably either status could be appropriate, but I don't feel strongly. I've changed it to 404, should be deployed in ~5m.