display problem for multipart SMS with default 7-bit alphabet Data Coding Scheme

fadasi commented 7 years ago

Multipart SMS with UCS2 Data Coding Scheme (TP-DCS = 8) are well displayed.

But I have a display problem for multipart SMS with default 7-bit alphabet Data Coding Scheme (TP-DCS = 0).

Display not Ok with ...

... SMS 1 with 3 parts (Message: 11111111111111111111...):

07913306000000F0 44 0B913306000000F0 0000 61011022113380 A0 050003CB0301 62B1582C168BC562B1582C168BC562B1582C168BC562B1582C168BC562B1582C168BC562B1582C168BC562B1582C168BC562B1582C168BC562B1582C168BC562B1582C168BC562B1582C168BC562B1582C168BC562B1582C168BC562B1582C168BC562B1582C168BC562B1582C168BC562B1582C168BC562B1582C168BC562B1582C168BC562
07913306000000F0 44 0B913306000000F0 0000 61011022114380 A0 050003CB0302 62B1582C168BC562B1582C168BC562B1582C168BC562B1582C168BC562B1582C168BC562B1582C168BC562B1582C168BC562B1582C168BC562B1582C168BC562B1582C168BC562B1582C168BC562B1582C168BC562B1582C168BC562B1582C168BC562B1582C168BC562B1582C168BC562B1582C168BC562B1582C168BC562B1582C168BC562
07913306000000F0 64 0B913306000000F0 0000 61011022115380 31 050003CB0303 62B1582C168BC562B1582C168BC562B1582C168BC562B1582C168BC562B1582C168BC56231

... SMS 2 with 2 parts (Message: Aaaaa Aaaaa ...):

07913306000000F0 44 0B913306000000F0 0000 61011012939280 A0 050003CA0201 82E170380C0A86C3E13028180E87C3A060381C0E8382E170380C0A86C3E13028180E87C3A060381C0E8382E170380C0A86C3E13028180E87C3A060381C0E8382E170380C0A86C3E13028180E87C3A060381C0E8382E170380C0A86C3E13028180E87C3A060381C0E8382E170380C0A86C3E13028180E87C3A060381C0E8382E170380C0A86C3
07913306000000F0 64 0B913306000000F0 0000 61011012930380 1B 050003CA0202 C26150301C0E8741C170381C0605C3E17018

Display Ok with ...

... SMS 3 starting as SMS 1 but truncated to fit on 1 part (Message: 11111111111111111111...):

07913306000000F0 24 0B913306000000F0 0000 61011022136280 A0 B1582C168BC562B1582C168BC562B1582C168BC562B1582C168BC562B1582C168BC562B1582C168BC562B1582C168BC562B1582C168BC562B1582C168BC562B1582C168BC562B1582C168BC562B1582C168BC562B1582C168BC562B1582C168BC562B1582C168BC562B1582C168BC562B1582C168BC562B1582C168BC562B1582C168BC562B1582C168BC562

... SMS 4 starting as SMS 2 but truncated to fit on 1 part (Message: Aaaaa Aaaaa ...):

07913306000000F0 24 0B913306000000F0 0000 61011022724280 A0 C170381C0605C3E17018140C87C36150301C0E8741C170381C0605C3E17018140C87C36150301C0E8741C170381C0605C3E17018140C87C36150301C0E8741C170381C0605C3E17018140C87C36150301C0E8741C170381C0605C3E17018140C87C36150301C0E8741C170381C0605C3E17018140C87C36150301C0E8741C170381C0605C3E17018140C87C3

(You can decode the PDUs using https://www.diafaan.com/sms-tutorials/gsm-modem-tutorial/online-sms-pdu-decoder/ or http://smstools3.kekekasvi.com/topic.php?id=288)

I tried to analyze data, my understanding is that with multipart SMS with default 7-bit alphabet Data Coding Scheme (TP-DCS = 0), a padding bit is added to the first byte of the message (just after the UDH). To display the message text, the first byte must be treated by removing the right bit, the next bytes are then encoded normally.

In my SMS 1 part 1 (Message: 11111111111111111111...):

first byte is 62 and 62>>1 = ‭31‬ = '1'
next bytes B1582C168BC5... are encoded correctly '1111111111111111111...'

In my SMS 2 part 1 (Message: Aaaaa Aaaaa ...):

first byte is 82 and 82>>1 = ‭41‬ = 'A'
next bytes E170380C0A86C3E1302818... are encoded correctly 'aaaa Aaaaa A...'

fadasi commented 7 years ago

I illustrated the problem (click on the image):

pdu_7bit_text_encoding

Converting 7-bit to 8-bit, should begin:

not just after the UDH
but at the first byte of TP-UD, but the first N characters of UDH should not be displayed. And N = ((UDL + 1) * 8) / 7 + (((UDL + 1) * 8) % 7 == 0 ? 0 : 1)

wdoekes commented 7 years ago

Just a recap:

So that 8th (zero-based) octet is PDU Header, which contains:

0x4 = TP-MMS
0x20 = TP-SRI = TP-Status-Report-Indication (on last message? not necessary)
0x40 = TP-UDHI = User Data Header Indicator

Where 0x40 is set on the multipart messages, and then the next N bytes are UDH where N is (UDH[0]+1) and is 5 when UDH[1] (IEI) == 0x00 (concatenated sms); see https://en.wikipedia.org/wiki/User_Data_Header

As for the 7bit decoding, you should really just decode as before and then drop the first M characters of decoded output; where M = ceil(len(UDH) * 8 / 7)

fadasi commented 7 years ago

I'm currently testing a modified pdu.c:

sed -i 's/if(PDUTYPE_UDHI(pdu_type) == PDUTYPE_UDHI_HAS_HEADER)/if(PDU_DCS_ALPABET(dcs) != PDU_DCS_ALPABET_7BIT && PDUTYPE_UDHI(pdu_type) == PDUTYPE_UDHI_HAS_HEADER)/' pdu.c

Even without removing the first characters that is much better.

The solution: add a parameter 'char * udh' to the 'EXPORT_DEF pdu_parse const char ' (pdu.c). The function 'static int at_response_cmgr ' (at_response.c) must be adapted.

wdoekes commented 7 years ago

I happened to bump into this PR for the original bg111 fork: https://github.com/bg111/asterisk-chan-dongle/pull/214/commits/2aba9dd30da7bfe69377c7b4cbf313ce01fa347c

I only glanced at it briefly, but it may be what you're looking for?

wdoekes / asterisk-chan-dongle

display problem for multipart SMS with default 7-bit alphabet Data Coding Scheme #13