Discord rich presence plugin shows an error if song title is too long

Miss-Inputs commented 1 year ago

Steps to reproduce

Activate Discord plugin
Start playing a song that has a fairly long title. In this case I am listening to Cowgirl / Synaesthesia (original En Motion mix) / Deepest Blue (a cappella) (Underworld vs. The Thrillseekers vs. Deepest Blue) by The Cut Up Boys

Expected Output

If the resulting status is too long, it should be truncated to whatever Discord thinks is the maximum width

Actual Output

ServerError: Child "activity" fails because child "details" fails because "details" length must be less than or equal to 128 characters long
------
Traceback (most recent call last):

  File "/usr/lib/python3/dist-packages/quodlibet/plugins/events.py", line 140, in __invoke
    handler(*args)

  File "/home/megan/.config/quodlibet/plugins/discord_status.py", line 95, in plugin_on_song_started
    self.handle_play()

  File "/home/megan/.config/quodlibet/plugins/discord_status.py", line 82, in handle_play
    self.update_discordrp(details, state)

  File "/home/megan/.config/quodlibet/plugins/discord_status.py", line 64, in update_discordrp
    self.discordrp.update(details=details, state=state,

  File "/home/megan/.local/lib/python3.10/site-packages/pypresence/presence.py", line 36, in update
    return self.loop.run_until_complete(self.read_output())

  File "/usr/lib/python3.10/asyncio/base_events.py", line 646, in run_until_complete
    return future.result()

  File "/home/megan/.local/lib/python3.10/site-packages/pypresence/baseclient.py", line 88, in read_output
    raise ServerError(payload["data"]["message"])

pypresence.exceptions.ServerError: Child "activity" fails because child "details" fails because "details" length must be less than or equal to 128 characters long

Discord status is not updated, and is still the previous song that was played or Paused or whatever it was before.

Test System

Which version of Quod Libet?

4.5.0 but Discord plugin was taken from Git commit 721ff2e3f2273537988b73fea7071b6390b1f34a

Which operating system

Xubuntu 22.04

If it's audio-related, what back-end?

N/A

Additional Information

afontenot commented 1 year ago

Determining the length of strings is surprisingly complicated. A trivial looking fix like

if len(s) > 128:
    s = s[:125] + "..."

might fail, because Python's string type indexes by code points. Anyone fixing this should

Check that the Discord API and the pypresence library support being passed utf-8 strings - e.g. that neither of them assumes ASCII / latin-1.
Check whether the Discord API measures the string length in bytes, code points, or graphemes. For instance, in utf-8, "🏳️‍🌈" is 1 grapheme, 4 code points, and 14 bytes. A good way to test this is to try setting the Discord "rich presence" details field to "🏳️‍🌈" * 128, and if that fails, try "🏳️‍🌈" * 32, and if that fails, try "🏳️‍🌈" * 9. (It's also possible they're allowing 128 wide 2-byte units, with an internal utf-16 representation.)
- Note that Discord is completely inconsistent about this. The error message (above) says <= 128 characters, their documentation says 128 bytes, and the example they give on the same page allocates a 256 byte buffer. Best to find out what the real limit is through actual testing.
Break the string appropriately depending on what Discord allows. You might consider splitting on graphemes instead of code points if you can find a good library for that. If Discord counts bytes, not characters, and the (encoded) string doesn't fit, iterate over the graphemes, and fill a new string with them until the next grapheme would take you over 125 bytes. Then add "...".

Note that if you decide not to do grapheme splitting, you still need to be careful if there's a byte size limit not to break in the middle of a code point. Something like x.encode("utf-8")[:125].decode("utf-8", errors="ignore") + "..." should suffice.

I can't work on this because I don't use Discord. Happy to help review a PR though.

nemethf commented 1 year ago

Should this be solved upstream instead? Like self.discordrp.update(details=details, state=state, truncate=True, ...), ie by adding a new argument truncate, always_fit, or something like that.

afontenot commented 1 year ago

Should this be solved upstream instead?

I think trying to get a fix in pypresence is probably the reasonable way to try to fix this, if they'll accept it.

Either way, roughly the same code is needed. E.g. if you care about not breaking graphemes (e.g. turning 🏳️‍🌈 into 🏳️ after truncation), you have to use a unicode aware grapheme splitting library. And best to figure out what Discord's actual limit is (128 bytes? 128 characters? 128 code points? 128 wchar_t? Something else?) with actual testing.

nemethf commented 1 year ago

(On the one hand, I find it surprising it is not possible to detect incomplete graphemes. On the other hand, I think this could potentially lead to hilarious misunderstandings.)

teeleafs commented 1 year ago

I'll admit I went by the "128 characters" message and had tested with Latin and Japanese characters, and did not bother to test with emoji and flags. I ran more tests:

"a" 128 -> OK "a" 129 -> Nope "λ" 128 -> OK "λ" 129 -> Nope "シ" 128 -> OK "シ" 129 -> Nope "😺" 128 -> Nope "😺" 64 -> OK "😺" 65 -> Nope "🏳️‍🌈" 128 -> Nope "🏳️‍🌈" 21 -> OK "🏳️‍🌈" 22 -> Nope

I'm not too familiar with Unicode encoding, so I'm all ears for suggestions.

Miss-Inputs commented 1 year ago

Would it make more sense to truncate at 127 characters instead, or 126 (whatever is the maximum amount of characters where you can guarantee that adding an ellipsis would not reach 128 bytes in any reasonable scenario), to avoid the issue entirely? Gets the message across (all my friends will know I am very cool for listening to a certain song) without running into edge cases. Although, I'm not sure what song titles are that long and have a gay pride flag in their title anyway… now I have to find one LOL

teeleafs commented 1 year ago

Well going by my tests the limit seems to be 21 gay pride flags, and we can't truncate everything down to 21 now can we? I'm not sure but Wiki suggests flag emojis take up 2 code points and hence 8 bytes? It doesn't seem to match up with the rest of the tests where each "character" take up (1, 2, 3, 4) bytes assuming UTF-8?

afontenot commented 1 year ago

I'm not sure but Wiki suggests flag emojis take up 2 code points and hence 8 bytes? It doesn't seem to match up with the rest of the tests where each "character" take up (1, 2, 3, 4) bytes assuming UTF-8?

In UTF-8 "characters" (meaning graphemes) can use an unlimited number of bytes. You are thinking of code points - all UTF-8 code points take between 1 and 4 bytes. A fully qualified 🏳️‍🌈 flag takes 14 bytes in UTF-8 and 12 bytes in UTF-16.

Thank you for the tests. Based on your work, I believe I can say what the Discord limits are. They allow 256 bytes of UTF-16 encoded text, not including any byte order mark (BOM) at the beginning. That is 128 ASCII characters when encoded in UTF-16.

The best you can do without grapheme splitting looks something like this:

def trim_text_utf16(s, max_bytes=256):
    # use _le tagged encoding to avoid BOM insertion
    if len(s.encode("utf_16_le")) <= max_bytes:
        return s
    result = ""
    byte_size = 0
    for cp in s:
        byte_size += len(cp.encode("utf_16_le"))
        # 2 = len("…".encode("utf_16_le"))
        if byte_size > max_bytes - 2:
            return result + "…"
        result += cp

Call the function like this:

>>> trim_text_utf16("a" * 1000)
'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa…'
>>> trim_text_utf16("aaa" + "🏳️‍⚧️" * 1000)
'aaa🏳️‍⚧️🏳️‍⚧️🏳️‍⚧️🏳️‍⚧️🏳️‍⚧️🏳️‍⚧️🏳️‍⚧️🏳️‍⚧️🏳️‍⚧️🏳️‍⚧️🏳️‍⚧️🏳️‍⚧️🏳️‍⚧️🏳️‍⚧️🏳️‍⚧️🏳️‍⚧️🏳️‍⚧️🏳️‍⚧️🏳️‍⚧️🏳️‍⚧️🏳️‍…'

Note the borked flag at the end - the result of splitting by code point instead of by grapheme.

This may be considered good enough given the relative rarity of long titles. I would present this information to the upstream developer (pypresence) and see if they want to fix it.

If they want to do grapheme splitting there are a few packages for it. I think regex can technically do it although I don't know if it stays up to date with Unicode (IIRC it has the character classes hardcoded in its source code). The "right" way to do this is probably pyicu. Otherwise, I'd use the trim_text_utf16 function above.

quodlibet / quodlibet