snarfed / bridgy

đź“Ł Connects your web site to social media. Likes, retweets, mentions, cross-posting, and more...
https://brid.gy
Creative Commons Zero v1.0 Universal
707 stars 52 forks source link

Bluesky: support `at://` synd links #1579

Closed snarfed closed 10 months ago

snarfed commented 10 months ago

Interesting surprise, two of the early beta testers who signed up to try the new Bluesky support use at:// URIs as their synd links, not bsky.app links: https://github.com/snarfed/bridgy/issues/1453#issuecomment-1780105260 . @JoelOtter those should be pretty straightforward to support too, right?

snarfed commented 10 months ago

Specifically I think we just need to implement Bluesky.canonicalize_url to accept and return at:// URIs. Example:

https://github.com/snarfed/bridgy/blob/71225d9f42a8e39a6bfbd7fdb19e43d85507535a/flickr.py#L87-L95

Called from here:

https://github.com/snarfed/bridgy/blob/71225d9f42a8e39a6bfbd7fdb19e43d85507535a/original_post_discovery.py#L526-L534

JoelOtter commented 10 months ago

Yep should be doable easily enough! Will do some experimentation

JoelOtter commented 10 months ago

Hmm. Have done the canonicalisation and it seems to pick up the Bluesky content OK using the discover endpoint, but doesn't identify any webmention targets.

snarfed commented 10 months ago

Feel free to look at the SyndicatedPost entities in the datastore to see if they're what you expect! https://console.cloud.google.com/datastore/entities/query?project=brid-gy

JoelOtter commented 10 months ago

I’m running locally alas! Is there a GUI for the emulator at all?

snarfed commented 10 months ago

Not any more, sadly, but you can look at them in a python shell:

# in virtualenv
env APP_ID=brid-gy python

from oauth_dropins.webutil.appengine_config import ndb_client
from bluesky import Bluesky

context = ndb_client.context()
context.__enter__()
snarfed = Bluesky.get_by_id('did:plc:fdme4gb7mu7zrie7peay7tst')
print(snarfed)
JoelOtter commented 10 months ago

The problem appears to be that the SyndicatedPosts get inserted with their syndication field as the at:// one, but when doing original post discovery for the backfeed it looks them up based on the post URL from Bluesky, which is the http:// one.

I'm not sure what to do here. If we were in a clean environment I guess we could just always canonicalise everything to a at:// URL, but that would presumably break all existing relationships in the DB. We could do it the other way, which wouldn't break any (working) existing data, but that strikes me as very non-future-proof.

snarfed commented 10 months ago

Understood, that makes sense.

On the one hand, this is a Bluesky integration, not an ATProto integration, so I'm reluctant to go too deep into ATProto itself. On the other hand, at:// URIs are probably the way to go. Even after federation, we can switch to talking to the AppView and still expect to get data accepted all/most PDSes.

We could backfill existing SyndicatedPosts and convert their URLs, but we could just let OPD create all new ones and ignore the old bsky.app ones. I'm fine either way.

JoelOtter commented 10 months ago

If this is something where it would fix itself without intervention on next crawl I’d be fine with that. The issue I guess is if it would cause duplicate web mentions to be fired

snarfed commented 10 months ago

Yes! It would effectively fix itself, by storing and using new SyndicatedPost entities with at:// URIs. It might indeed send dupe wms, but the source and target URLs should be the same, so that should be fine.

snarfed commented 10 months ago

The fix might be as easy as changing Bluesky.URL_CANONICALIZER to accept both bsky.app and at:// URIs and always emit to at:// URIs.

One catch is that we probably still want to use bsky.app URLs in the underlying Response.response_json AS1 objects' url property, since that ends up in human visible links that people see and click on. I haven't thought through how easy it will be to keep those different from the syndication URLs that we do OPD on. Maybe easy?...maybe not.

JoelOtter commented 10 months ago

Yeah was going to say, on reflection the at URIs would be useless to a backfeed receiver. It would actually be pretty easy to just canonicalise everything to a bsky.app URL for now- we could use DIDs rather than handles in them so they should be pretty solid well into the future. Federation is obviously its own fairly large problem but I feel like we’ll have a lot to solve all in one go when that comes in anyway? Possibly alternatively we would need to look into separating out a “silo view” and “user view” of the silo URL but that’s a refactor I wouldn’t be comfortable doing myself.

snarfed commented 10 months ago

Canonicalize to bsky.app works for me!

I actually think we may be pretty ok as is for federation without any big changes, assuming we can switch all of our API requests over to the AppView (api.bsky.social) and they'll Just Work? Not 100% sure on that, but fairly confident. We'll see.

snarfed commented 10 months ago

Deployed! Let's see how it works on https://brid.gy/bluesky/did:plc:s2koow7r6t7tozgd4slc3dsg and https://brid.gy/bluesky/did:plc:bnllqqdlaspfnvesydntke4e ...

JoelOtter commented 10 months ago

I'm unable to get it to work for this post now it's deployed. Bridgy appears to find the relationship now but the responses to the post on Bluesky don't seem to trigger webmentions. Tried doing a recrawl/repoll/etc, nothing

JoelOtter commented 10 months ago

(My Bridgy page: https://brid.gy/bluesky/did:plc:ioz4ztghfznx4s5s4jxqiqun )

snarfed commented 10 months ago

Hmm! Looks like this poll found it and canonicalized the URL to bsky.app correctly: https://brid.gy/log?module=background&start_time=1698761253&key=agdicmlkLWd5ci0LEgdCbHVlc2t5IiBkaWQ6cGxjOmlvejR6dGdoZnpueDRzNXM0anhxaXF1bgw

snarfed commented 10 months ago

Looks like there was only one response from someone else, a like. I clicked on its retry button in Bridgy and that finally did it. 🤷‍♂️

JoelOtter commented 10 months ago

I uh...forgot about that button 🤦