twitter / twitter-text

Twitter Text Libraries. This code is used at Twitter to tokenize and parse text to meet the expectations for what can be used on the platform.
https://developer.twitter.com/en/docs/counting-characters
Apache License 2.0
3.07k stars 520 forks source link

Update documentation - specifically for CJK and Emoji #303

Closed edent closed 4 years ago

edent commented 4 years ago

The official documentation doesn't reflect reality. Especially around Chinese, Japanese, and Korean text.

On https://developer.twitter.com/en/docs/basics/counting-characters the documentation says

Tweet length is measured by the number of codepoints in the NFC normalized version of the text.

This is not quite true :-)

The Twitter-Text page - https://developer.twitter.com/en/docs/developer-utilities/twitter-text - says:

The Configuration defines Unicode code point ranges, with a weight associated with each of these ranges. This enables language density to be taken into consideration when counting characters.

But it doesn't explain which high-density languages are included. Nor does it explain Emoji.

Reading between the lines

On the blog post announcing 280 characters, you say:

We want every person around the world to easily express themselves on Twitter, so we're doing something new: we're going to try out a longer limit, 280 characters, in languages impacted by cramming (which is all except Japanese, Chinese, and Korean).

Hidden in the POST status/update documentation, it says:

To make room for more expression, we will now count all emojis as equal—including those with gender‍‍‍ ‍‍and skin tone modifiers 👍🏻👍🏽👍🏿. This is now reflected in Twitter-Text, our Open Source library.

But I don't see that in the official documentation anywhere.

Suggestion

andypiper commented 4 years ago

Thanks, this has been on our (my) backlog for a while - thus the line in the counting chars page about the documentation to be updated soon (!!) ... sorry about that! This is good feedback. I'll move it up my list for attention.

Internal - Jira DAPS-637

andypiper commented 4 years ago

Thanks for the feedback! Happy to say that the counting characters page has now been completely overhauled, we appreciate your support!

https://developer.twitter.com/en/docs/basics/counting-characters

(the twitter-text page on the dev site will follow in the future)