tdlib / td

Cross-platform library for building Telegram clients
https://core.telegram.org/tdlib
Boost Software License 1.0
7.11k stars 1.44k forks source link

Offset value is incorrect is case Message begins with emoji. #2984

Closed tilvin closed 3 months ago

tilvin commented 3 months ago

Hello, could please someone help me, i face this problem: When message string begins with emoji, entities offset is somehow wrong. For example: i send a message: 😀qwerty I receive a Message with content: _content=#<TD::Types::MessageContent::Text text=#<TD::Types::FormattedText text="😀qwerty" entities=[#<TD::Types::TextEntity offset=3 length=5 type=#>]> web_page=nil> replymarkup=nil>

It says that in my message bold text begins from symbol n 3. So i insert <b> tag to third symbol of string and the result is 😀qwerty.

In case there's no emoji at the beginning of string, offset is correct. I send message: qwerty I receive a Message with content: _content=#<TD::Types::MessageContent::Text text=#<TD::Types::FormattedText text="qwerty" entities=[#<TD::Types::TextEntity offset=1 length=5 type=#>]> web_page=nil> replymarkup=nil>

So bold text starts with 1st symbol, which is correct.

levlam commented 3 months ago

See documentation for the fields:

@offset Offset of the entity, in UTF-16 code units
@length Length of the entity, in UTF-16 code units

See https://en.wikipedia.org/wiki/UTF-16 for description of UTF-16 encoding and how to handle correctly the offset.

tilvin commented 3 months ago

Thank you!

tilvin commented 3 months ago

If someone face same problem and using Ruby on Rails, please find my comment https://github.com/southbridgeio/tdlib-ruby/issues/66