Closed snarfed closed 10 months ago
Specifically I think we just need to implement Bluesky.canonicalize_url
to accept and return at://
URIs. Example:
https://github.com/snarfed/bridgy/blob/71225d9f42a8e39a6bfbd7fdb19e43d85507535a/flickr.py#L87-L95
Called from here:
Yep should be doable easily enough! Will do some experimentation
Hmm. Have done the canonicalisation and it seems to pick up the Bluesky content OK using the discover endpoint, but doesn't identify any webmention targets.
Feel free to look at the SyndicatedPost
entities in the datastore to see if they're what you expect! https://console.cloud.google.com/datastore/entities/query?project=brid-gy
I’m running locally alas! Is there a GUI for the emulator at all?
Not any more, sadly, but you can look at them in a python shell:
# in virtualenv
env APP_ID=brid-gy python
from oauth_dropins.webutil.appengine_config import ndb_client
from bluesky import Bluesky
context = ndb_client.context()
context.__enter__()
snarfed = Bluesky.get_by_id('did:plc:fdme4gb7mu7zrie7peay7tst')
print(snarfed)
The problem appears to be that the SyndicatedPosts get inserted with their syndication
field as the at://
one, but when doing original post discovery for the backfeed it looks them up based on the post URL from Bluesky, which is the http://
one.
I'm not sure what to do here. If we were in a clean environment I guess we could just always canonicalise everything to a at://
URL, but that would presumably break all existing relationships in the DB. We could do it the other way, which wouldn't break any (working) existing data, but that strikes me as very non-future-proof.
Understood, that makes sense.
On the one hand, this is a Bluesky integration, not an ATProto integration, so I'm reluctant to go too deep into ATProto itself. On the other hand, at://
URIs are probably the way to go. Even after federation, we can switch to talking to the AppView and still expect to get data accepted all/most PDSes.
We could backfill existing SyndicatedPost
s and convert their URLs, but we could just let OPD create all new ones and ignore the old bsky.app ones. I'm fine either way.
If this is something where it would fix itself without intervention on next crawl I’d be fine with that. The issue I guess is if it would cause duplicate web mentions to be fired
Yes! It would effectively fix itself, by storing and using new SyndicatedPost
entities with at://
URIs. It might indeed send dupe wms, but the source and target URLs should be the same, so that should be fine.
The fix might be as easy as changing Bluesky.URL_CANONICALIZER
to accept both bsky.app and at://
URIs and always emit to at://
URIs.
One catch is that we probably still want to use bsky.app URLs in the underlying Response.response_json
AS1 objects' url
property, since that ends up in human visible links that people see and click on. I haven't thought through how easy it will be to keep those different from the syndication URLs that we do OPD on. Maybe easy?...maybe not.
Yeah was going to say, on reflection the at URIs would be useless to a backfeed receiver. It would actually be pretty easy to just canonicalise everything to a bsky.app URL for now- we could use DIDs rather than handles in them so they should be pretty solid well into the future. Federation is obviously its own fairly large problem but I feel like we’ll have a lot to solve all in one go when that comes in anyway? Possibly alternatively we would need to look into separating out a “silo view” and “user view” of the silo URL but that’s a refactor I wouldn’t be comfortable doing myself.
Canonicalize to bsky.app works for me!
I actually think we may be pretty ok as is for federation without any big changes, assuming we can switch all of our API requests over to the AppView (api.bsky.social) and they'll Just Work? Not 100% sure on that, but fairly confident. We'll see.
Deployed! Let's see how it works on https://brid.gy/bluesky/did:plc:s2koow7r6t7tozgd4slc3dsg and https://brid.gy/bluesky/did:plc:bnllqqdlaspfnvesydntke4e ...
I'm unable to get it to work for this post now it's deployed. Bridgy appears to find the relationship now but the responses to the post on Bluesky don't seem to trigger webmentions. Tried doing a recrawl/repoll/etc, nothing
(My Bridgy page: https://brid.gy/bluesky/did:plc:ioz4ztghfznx4s5s4jxqiqun )
Hmm! Looks like this poll found it and canonicalized the URL to bsky.app correctly: https://brid.gy/log?module=background&start_time=1698761253&key=agdicmlkLWd5ci0LEgdCbHVlc2t5IiBkaWQ6cGxjOmlvejR6dGdoZnpueDRzNXM0anhxaXF1bgw
Looks like there was only one response from someone else, a like. I clicked on its retry button in Bridgy and that finally did it. 🤷‍♂️
I uh...forgot about that button 🤦
Interesting surprise, two of the early beta testers who signed up to try the new Bluesky support use
at://
URIs as their synd links, not bsky.app links: https://github.com/snarfed/bridgy/issues/1453#issuecomment-1780105260 . @JoelOtter those should be pretty straightforward to support too, right?