likerRr / smscountjs

Get sms count (7-bit, 8-bit, 16-bit)
http://likerrr.github.io/smscountjs/
5 stars 4 forks source link

Character counting of long messages is all wrong #1

Open babca opened 8 years ago

babca commented 8 years ago

Long SMS have less characters available because of bigger UDH. Source code counts long sms wrongly, examples and screenshots are wrong too. UTF-16 is not used in SMS. UCS-2 is.

likerRr commented 8 years ago

@babca thx for notes, but could you please provide real examples?

babca commented 8 years ago

It was just a quick comment :) as far as I can remember it should be like this:

GSM-7 1 msg ... 160 septets max 2 msgs ... 153+153 septets max 3 msgs ... 153+153+153 septets max 4 msgs ... 153+153+153+153 septets max and so on..

some characters like [ ] {} € need an escape character so they occupy 2 positions. see extension table here: https://en.wikipedia.org/wiki/GSM_03.38 You can notice there are even more 7-bit alphabets for different languages defined, but it is probably sufficient to support the default one.

8-bit 1 msg ... 140 octets max 2 msgs ... 134+134 octets max 3 msgs ... 134+134+134 octets max 4 msgs ... 134+134+134+134 octets max and so on.. (just for for curiosity, afaik 8-bit coding is not meant for text)

UCS-2 (16-bit characters) 1 msg ... 70 ucs-2 characters max 2 msgs ... 67+67 ucs-2 characters max 3 msgs ... 67+67+67 ucs-2 characters max 4 msgs ... 67+67+67+67 ucs-2 characters max and so on..

likerRr commented 8 years ago

@babca thx, it's very useful. I'll take a look :+1:

likerRr commented 3 years ago

UTF-16 is not used in SMS

In practice it is to send emojis.

I guess you are right. I wrote this lib ages ago, I think emojis were not so common and I didn't take it into account. If you would like to PR, I can accept it