robertoszek / pleroma-bot

Bot for mirroring one or multiple Twitter accounts in Pleroma/Mastodon/Misskey.
https://robertoszek.github.io/pleroma-bot
MIT License
104 stars 18 forks source link

custom text replacements #105

Closed selyod-ka closed 1 year ago

selyod-ka commented 1 year ago

hey,

i've made a modification to pleroma-bot in order to do custom text-replacements. basically i use this, because twitter-handles and mastodon-handles sometimes differ. doing these replacements, it is possible to keep links to an account that also exists on the same mastodon-instance although the handles on twitter and mastodon are not the same. i've added:

git diff
diff --git a/pleroma_bot/_processing.py b/pleroma_bot/_processing.py
index 6747a8b..c24c3dc 100644
--- a/pleroma_bot/_processing.py
+++ b/pleroma_bot/_processing.py
@@ -142,6 +142,13 @@ def process_tweets(self, tweets_to_post):
                 "https://youtube.com",
                 self.invidious_base_url
             )
+        if hasattr(self, "custom_replacements"):
+            if self.custom_replacements:
+                tweet["text"] = _custom_replacements(
+                    self,
+                    tweet["text"],
+                    self.custom_replacements
+                )
         signature = ''
         if self.signature:
             if self.archive:
@@ -376,6 +383,16 @@ def _replace_url(self, data, url, new_url):
         data = re.sub(match, new_url, data)
     return data

+def _custom_replacements(self, data, custom_replacements):
+    # we will do case-insensitive matching. but in order to lookup
+    # the replacement value in the custom_replacements-dict, we need
+    # to normalize dict's keys (lower-case, here).
+    custom_replacements = {k.lower(): v for k,v in custom_replacements.items()}
+    # alternation regex on all keys in custom_replacements
+    matches = re.findall("|".join(custom_replacements.keys()), data, re.IGNORECASE)
+    for match in matches:
+        data = re.sub(match, lambda m: custom_replacements[m.group().lower()], data )
+    return data

 def _remove_status_links(self, tweet):
     regex = r"\bhttps?:\/\/twitter.com\/+[^\/:]+\/.*?status\/\d*\b"
diff --git a/pleroma_bot/cli.py b/pleroma_bot/cli.py
index ca247c9..d5103b7 100644
--- a/pleroma_bot/cli.py
+++ b/pleroma_bot/cli.py
@@ -92,6 +92,7 @@ class User(object):
     from ._processing import _download_media
     from ._processing import _replace_mentions
     from ._processing import _get_best_bitrate_video
+    from ._processing import _custom_replacements

     def __init__(self, user_cfg: dict, cfg: dict, base_path: str):
         self.posts = None
@@ -138,6 +139,7 @@ class User(object):
             "invidious_base_url": "https://yewtu.be/",
             "archive": None,
             "visibility": None,
+            "custom_replacements": {} ,
             "max_post_length": 5000,
             "include_quotes": True,
             "website": None,

furthermore, i have a section in config.yml consisting of key-value-pairs under the following key:

custom_replacements
    "@replacethishandle": "@withthishandle"
    ....
robertoszek commented 1 year ago

Very nice! Looks pretty good to me, extensible, very useful and applicable for other usecases too, not just for Twitter handles (you could set up custom replacements for keywords or hashtags, just as an example).

Do you want to create a pull request to the develop branch for this? Or if you don't have the time or energy I could instead include these changes myself, just want to make sure your contribution is properly credited. I will mention it regardless on the release notes.

Thank you for your contribution!

On a somewhat related note, now that I see this it makes me think of a potential new feature. Perhaps we could also let people include content warnings if certain keywords are present on the tweet text. If a match is found, include the appropriate spoiler_text (or cw for Misskey): https://docs.joinmastodon.org/methods/statuses/#form-data-parameters https://misskey.io/api-doc#operation/notes/create

Something along the lines of this on the config:

content_warnings:
    "War, politics":
      - "war"
      - "election"
      - "politics"
      - "far-right"
      - "far-left"
    "Sexual abuse":
      - "sexual harassment"
      - "#metoo"

Would result on the content warning being added if any of those keywords are found: image

Maybe it's worth implementing.

selyod-ka commented 1 year ago

Nice that you like it! If you don't mind, please include the changes yourself.

robertoszek commented 1 year ago

Sure thing! I'll add the changes and credit you on the release notes.

Thanks once again for contributing and have a nice day!