snarfed / bridgy

📣 Connects your web site to social media. Likes, retweets, mentions, cross-posting, and more...
https://brid.gy
Creative Commons Zero v1.0 Universal
707 stars 52 forks source link

Bluesky: ValueError: Expected activity_id to be at:// URI; got 3k4celwmjo22p #1575

Closed snarfed closed 10 months ago

snarfed commented 10 months ago

First Bluesky bug! Congrats and condolences @JoelOtter 😎. Looks like the discover function, ie the "Resend for post" form on the user page, is unhappy. Caught by our crash reporter; I've added you to the Google Cloud project, you should be able to see it at https://console.cloud.google.com/errors/detail/CKDu3sTr_biL4AE;time=P30D?project=brid-gy if you log in with your Google account.

Stack trace:

 ValueError: Expected activity_id to be at:// URI; got 3k4celwmjo22p
at .get_activities_response ( /layers/google.python.pip/pip/lib/python3.11/site-packages/granary/bluesky.py:985 )
at .dispatch_request ( /workspace/tasks.py:491 )
at .background_handle_exception ( /workspace/flask_background.py:43 )

Full logs from one request:

[25/Oct/2023:14:26:28 -0700] POST /_ah/queue/discover HTTP/1.1 500

Params: [('source_key', 'agdicmlkLWd5ci0LEgdCbHVlc2t5IiBkaWQ6cGxjOmZkbWU0Z2I3bXU3enJpZTdwZWF5N3RzdAw'), ('post_id', '3k2b4xcjba22o')]
Got source: Bluesky(key=Key('Bluesky', 'did:plc:fdme4gb7mu7zrie7peay7tst'), auth_entity=Key('BlueskyAuth', 'did:plc:fdme4gb7mu7zrie7peay7tst'), blocked_ids=None, created=DatetimeWithNanoseconds(2023, 10, 25, 20, 44, 17, 875687, tzinfo=datetime.timezone.utc), domain_urls=['https://snarfed.org/'], domains=['snarfed.org'], features=['listen'], last_activities_cache_json=...
Source: snarfed.org (Bluesky) did:plc:fdme4gb7mu7zrie7peay7tst, https://brid.gy/bluesky/did:plc:fdme4gb7mu7zrie7peay7tst
no refresh_token
226 lexicons loaded
Using server at https://bsky.social/
com.atproto.server.createSession: {} {'identifier': 'snarfed.org', 'password': '...'}
Validating com.atproto.server.createSession parameters
Running <function post at 0x3e84c297ff60> https://bsky.social/xrpc/com.atproto.server.createSession {'identifier': 'snarfed.org', 'password': '...'}  {'User-Agent': 'Bridgy (https://brid.gy/about)', 'Content-Type': 'application/json'}
Logged into https://bsky.social/ as did:plc:fdme4gb7mu7zrie7peay7tst, setting access_token
Validating com.atproto.server.createSession output
226 lexicons loaded
Using server at https://bsky.social/
[crash]
JoelOtter commented 10 months ago

Did a little digging into this this morning:

snarfed commented 10 months ago
  • This is a valid AT URI per the spec! However, it appears the Bluesky API can't actually resolve that handle to a DID. Doing app.bsky.feed.getPostThread on this returns a 404. Replacing joelotter.com with the DID makes it work

Interesting that getPostThread chokes on it! Hmm. This kind of surprises me; I've filed https://github.com/bluesky-social/atproto/discussions/1778.

For our purposes, we can resolve handles to DIDs with either com.atproto.identity.resolveHandle, against PDS or BGS or AppView, or by doing handle resolution ourselves with eg arroba.did.resolve_handle. I'm not sure if I have a preference between the two yet.

I'm therefore slightly unsure how to resolve this DID given base_id is a static method and so presumably doesn't have access to our client...thoughts?

Totally ok to change base_id to a class method if necessary! Should be backward compatible.

JoelOtter commented 10 months ago

Interesting, thanks! I asked about it in the Bluesky devs matrix chat and got the following info:

Looks like the current implementation looks at the local PDS database so it would only find the thread if it is synced to the PDS.

https://github.com/bluesky-social/atproto/blob/bb039d8e4ce5b7f70c4f3e86d1327e210ef24dc3/packages/pds/src/api/app/bsky/feed/getPostThread.ts#L125

To decide if a PDS should pull the post from joelotter.com's PDS would be a matter for the local PDSs allow/deny list of PDSs it is willing to resolve content from.

JoelOtter commented 10 months ago

Seems we don't actually use arroba yet in Granary! Am I ok to add it as a dependency - should I have it install from Git or from Pip?

snarfed commented 10 months ago

Hah funny, I'm in the Discord but don't really follow the matrix chat, it tended to seem more theoretical and less hands on...which fits the answer here. That answer may technically be true, but doesn't really matter since federation isn't on in prod yet, so there's only one PDS, which has every post and thread. Not to mention the AppView, which is expected to have most/all posts, even after federation, has the same problem. Both are pretty clearly requiring DID-based at:// URIs right now. 🤷

snarfed commented 10 months ago

Seems we don't actually use arroba yet in Granary! Am I ok to add it as a dependency - should I have it install from Git or from Pip?

Hmm. On second thought, base_id currently doesn't make network requests, and I'm a bit reluctant to have it start. I wonder if there's a different place to do this that would be less invasive there?

JoelOtter commented 10 months ago

Could it resolve in the original post discovery? I fear we might still end up doing it in web_url_to_at_url though which gets called from all over presumably

JoelOtter commented 10 months ago

Looks like it's not actually used anywhere yet. I'll see if I can make the change :)

snarfed commented 10 months ago

Hmm yeah. And web_url_to_at_url probably shouldn't make network requests either.

So this specific issue is unrelated to original post discovery, but in both cases, we primarily care about normalizing URLs for the user's own posts. We already have the user's handle and DID, so how about we start small: in original post discovery and the discover task, we look for the user's handle specifically, and if it's in the URL, convert that to DID. Otherwise, we do nothing.

I'm down to try to do more unified global conversion of bsky.app URL => at:// URIs and handles => DIDs, but I'm not sure we quite have a handle on the best approach yet.

snarfed commented 10 months ago

(Also fwiw Bridgy Fed uses web_url_to_at_uri, https://github.com/snarfed/bridgy-fed/blob/ea08bd91538e26da3b295a10999e7191cfd6ca13/atproto.py#L158 , but that hasn't shipped yet so it doesn't matter.)

JoelOtter commented 10 months ago

Sounds good - should I do that in post_id do you think?

JoelOtter commented 10 months ago

Or base_id really I should say

JoelOtter commented 10 months ago

Wait I reread your previous comments, never mind, will need to add a new source method!

JoelOtter commented 10 months ago

Just thinking, is this something where the "right" thing to do is actually to implement UrlCanonicalizer properly for Bluesky? Then this becomes trivial, probably

snarfed commented 10 months ago

Yeah I've been thinking about that! The problem is bsky.app vs at://. I expect we don't always want Bluesky's UrlCanonicalizer to convert bsky.app URLs to at:// URIs, but sometimes we do...?

JoelOtter commented 10 months ago

This is one of those things where it's easy to do but hard to do right. I'll put something together now though we may want to revisit this as you say!

snarfed commented 10 months ago

Confirmed fixed! Thanks again @JoelOtter!