snarfed / bridgy-fed

🌉 A bridge between decentralized social network protocols
https://fed.brid.gy
Creative Commons Zero v1.0 Universal
565 stars 30 forks source link

ActivityPub => ATProto hashtags #980

Closed snarfed closed 4 months ago

snarfed commented 5 months ago

Not working yet. Example: https://indieweb.social/@laurenshof/112359491459258562 => https://bsky.app/profile/laurenshof.indieweb.social.ap.brid.gy/post/3krdoh7rypeb2

snarfed commented 5 months ago

AS2:

{
  "type": "Create",
  "id": "https://indieweb.social/users/laurenshof/statuses/112359491459258562/activity",
  "actor": "https://indieweb.social/users/laurenshof",
  "published": "2024-04-30T09:36:19Z",
  "object": {
    "type": "Note",
    "id": "https://indieweb.social/users/laurenshof/statuses/112359491459258562",
    "url": "https://indieweb.social/@laurenshof/112359491459258562",
    "published": "2024-04-30T09:36:19Z",
    "attributedTo": "https://indieweb.social/users/laurenshof",
    "content": "<p>test post for <a href=\"https://indieweb.social/tags/atproto\" class=\"mention hashtag\" rel=\"tag\">#<span>atproto</span></a> atproto</p>",
    "attachment": [],
    "tag": [{
      "type": "Hashtag",
      "href": "https://indieweb.social/tags/atproto",
      "name": "#atproto"
    }],
  }
  ...
}

...converted to AS1:

{
  "objectType": "activity",
  "verb": "post",
  "id": "https://indieweb.social/users/laurenshof/statuses/112359491459258562/activity",
  "actor": {
    "id": "https://indieweb.social/users/laurenshof",
    ...
  },
  "published": "2024-04-30T09:36:19Z",
  "object": {
    "objectType": "note",
    "id": "https://indieweb.social/users/laurenshof/statuses/112359491459258562",
    "url": "https://indieweb.social/@laurenshof/112359491459258562",
    "author": {"id": "https://indieweb.social/users/laurenshof"}
    "published": "2024-04-30T09:36:19Z",
    "content": "<p>test post for <a href=\"https://indieweb.social/tags/atproto\" class=\"mention hashtag\" rel=\"tag\">#<span>atproto</span></a> atproto</p>",
    "tags": [{
      "href": "https://indieweb.social/tags/atproto",
      "displayName": "#atproto"
    }],
  }
}

...converted to ATProto:

{
  "$type": "app.bsky.feed.post",
  "createdAt": "2024-04-30T09:36:19.000Z",
  "text": "test post for #atproto atproto"
}
snarfed commented 5 months ago

Woo, fixed!

austinhuang0131 commented 4 months ago

Are you sure it works?

snarfed commented 4 months ago

@austinhuang0131 looks like you're hitting https://github.com/snarfed/bridgy-fed/issues/1010

austinhuang0131 commented 4 months ago

It is in post text though... It's just that if the last line of the content consists solely of hashtags, then Mastodon will display it as a footer, it shouldn't have any effect on ActivityPub itself.

$ curl https://mstdn.party/users/austin/statuses/112405359754338066 -H "Accept: application/activity+json" | jq
{
  "@context": [...],
  "id": "https://mstdn.party/users/austin/statuses/112405359754338066",
  ...,                                                      
  "content": "<p>&quot;Would u like to buy a refurbished computer&quot;</p><p><a href=\"https://mstdn.party/tags/PostIt\" class=\"mention hashtag\" rel= \"tag\">#<span>PostIt</span></a></p>",                                       
  "contentMap": {
    "en": "<p>&quot;Would u like to buy a refurbished computer&quot;</p><p><a href=\"https://mstdn.party/tags/PostIt\" class=\"mention hashtag\" rel=\"t ag\">#<span>PostIt</span></a></p>"                                           
  },
  "attachment": [...],
  "tag": [
    {
      "type": "Hashtag",
      "href": "https://mstdn.party/tags/postit",
      "name": "#postit"
    }
  ],
  "replies": {...}
}
snarfed commented 4 months ago

Good point!

I haven't looked into this deeply yet. I'll follow up in #1010.

snarfed commented 4 months ago

Reopening, these seem to be unreliable. Eg here's one that didn't work: https://mastodon.online/@emarktaylor/112480973885137226 => https://bsky.app/profile/emarktaylor.mastodon.online.ap.brid.gy/post/3kszmvscesxh2

...and here's one with two, one worked, one didn't 🤪: https://mastodon.online/@emarktaylor/112480873556819998 => https://bsky.app/profile/emarktaylor.mastodon.online.ap.brid.gy/post/3kszlfj5l2wf2

qazmlp commented 4 months ago

Reopening, these seem to be unreliable. Eg here's one that didn't work: https://mastodon.online/@emarktaylor/112480973885137226 => https://bsky.app/profile/emarktaylor.mastodon.online.ap.brid.gy/post/3kszmvscesxh2

...and here's one with two, one worked, one didn't 🤪: https://mastodon.online/@emarktaylor/112480873556819998 => https://bsky.app/profile/emarktaylor.mastodon.online.ap.brid.gy/post/3kszlfj5l2wf2

I think it's case-sensitivity. Mastodon apparently sends hashtags according to its database rather than the post content. I wonder if that affects further Unicode collation beyond case 😬

(In the first example, it's '#USPol'/"#uspol", in the second it's "#Dogs"/#dogs but "#Minnesota"/"#Minnesota".)

hybridhavoc commented 4 months ago

That's interesting. For testing I made this post. Mastodon. Bluesky.

Neither registered as camel-cased in the JSON, even though the TuneTuesday tag is camel-cased on my Mastodon instance (I updated it myself long ago).

Made another post. Mastodon. Bluesky.

This time the tunetuesday hashtag looks good, but not the storysong one.

Posted one final one. Mastodon. Bluesky.

Does seem like the case sensitive thing is an issue, but it's kinda bothersome that it doesn't necessarily match what is on the OP's server. In the second post I think maybe the presence of the period directly following the storysong hashtag kept it from being processed properly? Just speculation there.

TheLastBoyScoutUK commented 4 months ago

Found this issue because I'm having the same issue with hashtags. Looks like it could indeed be case related.

Working ~https://fed.brid.gy/r/https://bsky.app/profile/did:plc:o66bj7shwur23ic7zpermpiw/post/3kt3hyskm4k24 => https://bsky.app/profile/hallenbeck.thelastboyscout.uk/post/3kt3hyskm4k24~

Not Working https://mastodon.social/@hallenbeck/112487404523932742 => https://bsky.app/profile/hallenbeck.mastodon.social.ap.brid.gy/post/3kt4i7hn3dw62

Update

Ignore the working one, I was in a hurry and didn't notice the Mastodon post was bridged from Bluesky to Mastodon. 🤦🏻‍♂️

Here's one I did on Mastodon with lowercase hashtag and that didn't work in Bluesky either:

https://mastodon.social/@hallenbeck/112490349136364796 => https://bsky.app/profile/hallenbeck.mastodon.social.ap.brid.gy/post/3kt5s3xsnfu22

snarfed commented 4 months ago

Thanks for the examples, all! Looks like case sensitivity was definitely one root cause here, punctuation on either end of the hashtag in the post text was another, and occasionally hashtags at the beginning or end of the post text were another. I think I've fixed all of those. Example: https://indieweb.social/@snarfed/112554052674489553 => https://bsky.app/profile/snarfed.indieweb.social.ap.brid.gy/post/3ku23dhx4bho2

Sadly AS2 dropped AS1's startIndex and length from tags, which makes this harder. (Combining startIndex and length with HTML content was also problematic, I get it, but still. 😕) Without those, processing AS2 hashtags requires searching post text for the hashtag string, which will always be a tricky balance between false negatives and false positives, and I don't know that I'll ever get it to 100% perfect. For example, in the example above, the #g hashtag didn't make it across because it's terminated by the 🫖 emoji. The AS2 tag object is fine, but my regex for searching post text doesn't currently handle emoji termination. Sigh.

That's pretty unusual though, so I'm tentatively marking this as fixed for now. Feel free to reopen or open new issues for other real world problems you all see. And thanks again for all the sleuthing!