telegramdesktop / tdesktop

Telegram Desktop messaging app
https://desktop.telegram.org/
Other
25.86k stars 5.12k forks source link

Combining diacritics get stripped #6362

Open ralesk opened 5 years ago

ralesk commented 5 years ago

Telegram desktop strips (some?) combining diacritics entirely, making it hard to send, for example, complex IPA across.

This is just marginally related to #2651, which was about the text rendering. It may also result in issues when communicating file names from Macs which use decomposed characters at least in the case of accented Latin letters.

Steps to reproduce

  1. Try to type/paste/etc. anything that's a letter + a combining accent, for example: o̿ (o with double overline above)
  2. Send the message
  3. Notice how: a. the message does not seem to contain the diacritic b. upon editing the diacritic is missing

Expected behaviour

The message should not be altered and the result should be o̿.

Actual behaviour

The message is altered and the result is o instead.

Configuration

Operating system: Linux, Fedora 29, MATE desktop Version of Telegram Desktop: 1.7.14

Aokromes commented 5 years ago

https://github.com/telegramdesktop/tdesktop/issues/1041

ralesk commented 5 years ago

No, this is not a keyboard input issue. Not related to #1041.

Aokromes commented 5 years ago

"When I normally write in all the rest of the programs, if I hit ' and e, I see é.

However, in Telegram app it appears the e without the tilde."

ralesk commented 5 years ago

That issue is about keyboard input, and in particular compose key (the X11 way of having multiple keystrokes result in a single letter) and/or a dead key (another way of having you press a sequence of keys to end up with a single letter) not being honoured by the input widget in Telegram and/or Telegram's Qt.

This issue is about character sequences (as opposed to keypress sequences), where you have literal characters in the paste buffer and Telegram or Telegram's Qt stripping so-called combining characters, which do not appear in the other issue whatsoever.

ralesk commented 4 years ago

So?

ralesk commented 4 years ago

Some combining diacritics get stripped. What, why, how. (Probably a Qt issue?)

o̿wo̿ gets stripped and rendered as o w o — note the space uvͮu doesn't get stripped and is rendered as is

Anyway, let's look at the entire combining range for shits and giggles:

binmode STDOUT, "encoding(utf-8)";
for (0x0300 .. 0x036f) {
   print sprintf("U+%04x", $_)."    a".chr($_)."x    ";
   print "\n" if $_ % 4 == 3;
}

This renders perfectly (as far as the fonts allow) in Discord:

image image

And there are multiple things that happen in Telegram:

image

Note how these are still good (except for the a + double grave) in the input box before sending... and they're mangled after sending (including when trying to edit again):

image

Here's it with fixed width so it's easier to spot (with fewer spaces):

image

I wonder what is so special about code points U+030A, U+0333 and U+033F that Telegram or Qt mangles them. I wonder if there are any more Unicode characters out there that get this treatment.

ralesk commented 4 years ago

P.S. considering Konsole (a Qt/KDE terminal app) doesn't mess it up, and neither do Clementine or Gwenview, maybe it's not a Qt issue afterall...

eternal-sorrow commented 4 years ago

They're not getting stripped, just rendered as a whitespace. You can successfully copy the incorrectly rendered text and paste it in another application, retaining all the "stripped" diacritics.

ralesk commented 4 years ago

I have just copied that message to here in this Github entry box and the accent is not present, whereas copying it from Discord (where it doesn't get mangled) works. So no, it's not a display issue, the character is getting replaced by a whitespace.

ralesk commented 4 years ago

Of course since @Aokromes has mistakenly closed it and still hasn't reopened it, it has even less of a chance ever getting noticed, not that anything ever gets noticed here anyway.

ilya-fedin commented 3 years ago

I wonder what is so special about code points U+030A, U+0333 and U+033F that Telegram or Qt mangles them.

These code points present in IsReplacedBySpace method: https://github.com/desktop-app/lib_ui/blob/d4c99701b5210a2db83b1c0f13da1a62f48dfb80/ui/text/text.cpp#L3444-L3457

I found this ticket by great accident

ralesk commented 3 years ago

Thank you! Feels good to be proven right.

stale[bot] commented 3 years ago

Hey there!

This issue was inactive for a long time and will be automatically closed in 30 days if there isn't any further activity. We therefore assume that the user has lost interest or resolved the problem on their own.

Don't worry though; if this is an error, let us know with a comment and we'll be happy to reopen the issue.

Thanks!

eternal-sorrow commented 3 years ago

The issue is still present.

stale[bot] commented 2 years ago

Hey there!

This issue was inactive for a long time and will be automatically closed in 30 days if there isn't any further activity. We therefore assume that the user has lost interest or resolved the problem on their own.

Don't worry though; if this is an error, let us know with a comment and we'll be happy to reopen the issue.

Thanks!

eternal-sorrow commented 2 years ago

Still having this issue

stale[bot] commented 2 years ago

Hey there!

This issue was inactive for a long time and will be automatically closed in 30 days if there isn't any further activity. We therefore assume that the user has lost interest or resolved the problem on their own.

Don't worry though; if this is an error, let us know with a comment and we'll be happy to reopen the issue.

Thanks!

eternal-sorrow commented 2 years ago

The issue is still there.

github-actions[bot] commented 1 year ago

Hey there!

This issue was inactive for a long time and will be automatically closed in 30 days if there isn't any further activity. We therefore assume that the user has lost interest or resolved the problem on their own.

Don't worry though; if this is an error, let us know with a comment and we'll be happy to reopen the issue.

Thanks!

eternal-sorrow commented 1 year ago

Nothing changed.

ralesk commented 1 year ago

My favourite bit about this — besides that automatic closing of issues shouldn't be a thing — is that git blame just says "initial commit" and nobody knows why on Earth those codepoints are even in that list of bad codepoints. That function makes so little sense...

ilya-fedin commented 1 year ago

@ralesk some of those functions are to ensure the custom widgets won't render incorrectly due to some nasty character, some of them are to replace characters like server does so tdesktop has valid offsets without re-downloading the sent message. It's unlikely those replacements will ever be revisited given that everyone is afraid to touch that place of tdesktop code (chance of big regressions is too high). You can treat this issue as an architectural one that will likely present all the tdesktop life time.

github-actions[bot] commented 1 year ago

Hey there!

This issue was inactive for a long time and will be automatically closed in 30 days if there isn't any further activity. We therefore assume that the user has lost interest or resolved the problem on their own.

Don't worry though; if this is an error, let us know with a comment and we'll be happy to reopen the issue.

Thanks!

Neurotoxin001 commented 1 year ago

+

ralesk commented 1 year ago

@ilya-fedin I don't think #8140 is related; diacritics aren't getting stripped there, just badly displayed by Qt (and/or the font).

ilya-fedin commented 1 year ago

I remember I checked the codepoints between the characters and it was using the ones that are in the lib_ui blacklist

github-actions[bot] commented 10 months ago

Hey there!

This issue was inactive for a long time and will be automatically closed in 30 days if there isn't any further activity. We therefore assume that the user has lost interest or resolved the problem on their own.

Don't worry though; if this is an error, let us know with a comment and we'll be happy to reopen the issue.

Thanks!

john-preston commented 3 months ago

It looks like server is stripping them, I can't send them from Android phone and see them from another Android phone in the received message.

'Te̊st' 'Te̳st' 'Te̿st'

john-preston commented 3 months ago

Screenshot_20240528_113832_Telegram

qigel commented 6 days ago

Desktop still brakes diacritics if it made by combined signs image In Android version it looks well