mdouchement / smpp

SMPP 3.4 Protocol for the Go programming language
MIT License
0 stars 2 forks source link

Bug with sending emojis in long sms #1

Open akonovalovdev opened 2 months ago

akonovalovdev commented 2 months ago

Hello, I use the go-smpp library to send SMS messages and faced the problem of sending SMS messages using UCS2 encoding

The problem occurs if a large message consisting of several parts is sent using the method func (t Transmitter) SubmitLongMsg(sm ShortMessage) ([]ShortMessage, error)

in situations where the smiley gets to the joints of the parts of the message, it splits into question marks and sometimes this leads to the fact that the message is not delivered to the subscriber at all. Since the mobile operator does not miss it due to questionable text

here is an example of a message in which all the emojis at the joints of the parts are torn into question marks, and 3 emojis are delivered properly at the end

shortMsg := "123456789112345678921234567893123456789412345678951234567896123456😊12345678911234567892123456789312345678941234567895123456789612345😊12345678911234567892123456789312345678941234567895123456789612😊99999999999😊😊😊"

I found out that this occurs due to the lack of checking for surrogate pairs when splitting long messages into parts. I haven't been able to figure out how to fix it yet. Would it be interesting for you to take a look on this?

mdouchement commented 2 months ago

Hello,

I don't see any issue after a quick look. UCS2 is just an UTF-16BE encoding. Each character is encoded on 2 bytes.

The beginning of this method bellow relies on the fact a character is encoded on 2 bytes (which works): https://github.com/mdouchement/smpp/blob/1d49729a21186219ce67e686a2b9d3a3c6e45525/smpp/transmitter.go#L330-L365

But surprisingly the smiley is encoded on 4 bytes which lead to your issue. The smiley is cut in half which lead to an invalid encoding of the short-message segments.

Here an example: https://go.dev/play/p/V35Lk2H5v05

package main

import (
    "fmt"

    "golang.org/x/text/encoding/unicode"
    "golang.org/x/text/transform"
)

func main() {
    fmt.Println("ascii:", UCS2("e").Encode())
    fmt.Println("accent:", UCS2("Γ©").Encode())
    fmt.Println("kanji:", UCS2("葉").Encode())

    fmt.Println("smiley:", UCS2("😊").Encode())
}

// https://github.com/mdouchement/smpp/blob/main/smpp/pdu/pdutext/ucs2.go

// UCS2 text codec.
type UCS2 []byte

// Encode to UCS2.
func (s UCS2) Encode() []byte {
    e := unicode.UTF16(unicode.BigEndian, unicode.IgnoreBOM)
    es, _, err := transform.Bytes(e.NewEncoder(), s)
    if err != nil {
        return s
    }
    return es
}

// Decode from UCS2.
func (s UCS2) Decode() []byte {
    e := unicode.UTF16(unicode.BigEndian, unicode.IgnoreBOM)
    es, _, err := transform.Bytes(e.NewDecoder(), s)
    if err != nil {
        return s
    }
    return es
}