Open akonovalovdev opened 2 months ago
Hello,
I don't see any issue after a quick look. UCS2 is just an UTF-16BE encoding. Each character is encoded on 2 bytes.
The beginning of this method bellow relies on the fact a character is encoded on 2 bytes (which works): https://github.com/mdouchement/smpp/blob/1d49729a21186219ce67e686a2b9d3a3c6e45525/smpp/transmitter.go#L330-L365
But surprisingly the smiley is encoded on 4 bytes which lead to your issue. The smiley is cut in half which lead to an invalid encoding of the short-message segments.
Here an example: https://go.dev/play/p/V35Lk2H5v05
package main
import (
"fmt"
"golang.org/x/text/encoding/unicode"
"golang.org/x/text/transform"
)
func main() {
fmt.Println("ascii:", UCS2("e").Encode())
fmt.Println("accent:", UCS2("Γ©").Encode())
fmt.Println("kanji:", UCS2("θ").Encode())
fmt.Println("smiley:", UCS2("π").Encode())
}
// https://github.com/mdouchement/smpp/blob/main/smpp/pdu/pdutext/ucs2.go
// UCS2 text codec.
type UCS2 []byte
// Encode to UCS2.
func (s UCS2) Encode() []byte {
e := unicode.UTF16(unicode.BigEndian, unicode.IgnoreBOM)
es, _, err := transform.Bytes(e.NewEncoder(), s)
if err != nil {
return s
}
return es
}
// Decode from UCS2.
func (s UCS2) Decode() []byte {
e := unicode.UTF16(unicode.BigEndian, unicode.IgnoreBOM)
es, _, err := transform.Bytes(e.NewDecoder(), s)
if err != nil {
return s
}
return es
}
Hello, I use the go-smpp library to send SMS messages and faced the problem of sending SMS messages using UCS2 encoding
The problem occurs if a large message consisting of several parts is sent using the method func (t Transmitter) SubmitLongMsg(sm ShortMessage) ([]ShortMessage, error)
in situations where the smiley gets to the joints of the parts of the message, it splits into question marks and sometimes this leads to the fact that the message is not delivered to the subscriber at all. Since the mobile operator does not miss it due to questionable text
here is an example of a message in which all the emojis at the joints of the parts are torn into question marks, and 3 emojis are delivered properly at the end
I found out that this occurs due to the lack of checking for surrogate pairs when splitting long messages into parts. I haven't been able to figure out how to fix it yet. Would it be interesting for you to take a look on this?