yuletide / nitterbot

A mastodon bot to replace twitter links with Nitter links to view without ads or tracking
https://botsin.space/@nitterbot
9 stars 3 forks source link

Handle links where url is hidden in html #10

Open yuletide opened 1 year ago

yuletide commented 1 year ago

Example: https://mastodon.social/@spiegelsche@f.3ischn.de/109582649657814621

022-12-27T00:11:01Z app[c2c6b821] sjc [info]===== found mention in reply to yuletide id 109391862882784405 =====
2022-12-27T00:11:01Z app[c2c6b821] sjc [info]{'id': 109582736490955865, 'created_at': datetime.datetime(2022, 12, 27, 0, 11, tzinfo=tzlocal()), 'in_reply_to_id': None, 'in_reply_to_account_id': None, 'sensitive': False, 'spoiler_text': '', 'visibility': 'public', 'language': 'en', 'uri': 'https://mastodon.social/users/yuletide/statuses/109582736425325992', 'url': 'https://mastodon.social/@yuletide/109582736425325992', 'replies_count': 0, 'reblogs_count': 0, 'favourites_count': 0, 'edited_at': None, 'favourited': False, 'reblogged': False, 'muted': False, 'bookmarked': False, 'content': '<p><span class="h-card"><a href="https://botsin.space/@nitterbot" class="u-url mention" rel="nofollow noopener noreferrer" target="_blank">@<span>nitterbot</span></a></span> can you help with this link</p>', 'filtered': [], 'reblog': None, 'account': {'id': 109391862882784405, 'username': 'yuletide', 'acct': 'yuletide@mastodon.social', 'display_name': 'alex yuletide', 'locked': False, 'bot': False, 'discoverable': True, 'group': False, 'created_at': datetime.datetime(2022, 6, 21, 0, 0, tzinfo=tzlocal()), 'note': '<p>Spatial solutions arch &amp; web dev, social justice, civic tech, heavy metal.  Available for work! \u2029Proud parent to <span class="h-card"><a href="https://botsin.space/@nitterbot" class="u-url mention" rel="nofollow noopener noreferrer" target="_blank">@<span>nitterbot</span></a></span>\u2028\u2029\u2029Past: Mapbox Solutions Architect &amp; Tech Lead @ Community Team, <span class="h-card"><a href="https://mastodon.social/@recursecenter" class="u-url mention" rel="nofollow noopener noreferrer" target="_blank">@<span>recursecenter</span></a></span> fellow, founder of civic tech startup now part of @granicus, @codeforamerica fellow, @esri\u2029\u2028\u2029<a href="https://mastodon.social/tags/vegetarian" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>vegetarian</span></a> <a href="https://mastodon.social/tags/zen" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>zen</span></a> <a href="https://mastodon.social/tags/metal" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>metal</span></a> <a href="https://mastodon.social/tags/bassmusic" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>bassmusic</span></a> <a href="https://mastodon.social/tags/dj" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>dj</span></a> <a href="https://mastodon.social/tags/maps" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>maps</span></a> <a href="https://mastodon.social/tags/photography" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>photography</span></a> <a href="https://mastodon.social/tags/webdev" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>webdev</span></a> <a href="https://mastodon.social/tags/politics" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>politics</span></a></p>', 'url': 'https://mastodon.social/@yuletide', 'avatar': 'https://files.botsin.space/cache/accounts/avatars/109/391/862/882/784/405/original/0efc492b3538e902.png', 'avatar_static': 'https://files.botsin.space/cache/accounts/avatars/109/391/862/882/784/405/original/0efc492b3538e902.png', 'header': 'https://files.botsin.space/cache/accounts/headers/109/391/862/882/784/405/original/1f2a8c1cc92143b4.png', 'header_static': 'https://files.botsin.space/cache/accounts/headers/109/391/862/882/784/405/original/1f2a8c1cc92143b4.png', 'followers_count': 150, 'following_count': 88, 'statuses_count': 171, 'last_status_at': datetime.datetime(2022, 12, 27, 0, 0), 'emojis': [], 'fields': [{'name': 'Birdsite', 'value': '<a href="HTTPS://twitter.com/yuletide" rel="nofollow noopener noreferrer" target="_blank"><span class="invisible"></span><span class="">HTTPS://twitter.com/yuletide</span><span class="invisible"></span></a>', 'verified_at': None}, {'name': 'LinkedSite', 'value': '<a href="https://linkedin.com/in/alexyule" rel="nofollow noopener noreferrer" target="_blank"><span class="invisible">https://</span><span class="">linkedin.com/in/alexyule</span><span class="invisible"></span></a>', 'verified_at': None}]}, 'media_attachments': [], 'mentions': [{'id': 109543657746642932, 'username': 'nitterbot', 'url': 'https://botsin.space/@nitterbot', 'acct': 'nitterbot'}], 'tags': [], 'emojis': [], 'card': None, 'poll': None}
2022-12-27T00:11:01Z app[c2c6b821] sjc [info]filtered status @nitterbot can you help with this link
2022-12-27T00:11:01Z app[c2c6b821] sjc [info]no birdsite found, checking parent
2022-12-27T00:11:01Z app[c2c6b821] sjc [info]checking parent

Current behavior: We use HTMLParser to strip out all HTML, but turns out statuses can be rich formatted which explains why this exists in the first place. Some have funky formatting so there will be some weird edge cases likely if we leave the HTML in... Proposed behavior: Just replace all twitter.com with nitter instance, in both text and html and see what happens

yuletide commented 1 year ago

Another one https://botsin.space/@RobertMaguire@journa.host/109649227056480533

Seems to be a product of some crosspost bots, or this failed for some other reason