qotto / smspdudecoder

Python SMS PDU Decoder
MIT License
59 stars 18 forks source link

Source text doesn't equal text after encoding and decoding #9

Open mikevolgo opened 2 months ago

mikevolgo commented 2 months ago

Hi,

I've found a situation when after encoding and decoding text is not equal to source text. Example

text = 'ABCDEFGHIJKLMNOPQRSTUVWXYZÄÖÑܧ¿abcdefghijklmnopqrstuvwxyzäöñüà' print(codec.decode(codec.encode(text)) == text) False print(codec.decode(codec.encode(text))) ABCDEFGHIJKLMNOPQRSTUVWXYZÄÖÑܧ¿abcdefghijklmnopqrstuvwxyzäöñüà@

we can see that after encoding/decoding an extra symbol "@" is added.

alexpirine commented 2 months ago

Hi,

Thanks for the report.

Do you already have an idea why this might happen?

alexpirine commented 1 month ago

It seems like you have been using the GSM encoding.

There is a caveat that requires padding in certain situations:

https://github.com/qotto/smspdudecoder/blob/master/smspdudecoder/codecs.py#L87

In your case, you should consider using the following code:

from smspdudecoder.codecs import GSM

text = 'ABCDEFGHIJKLMNOPQRSTUVWXYZÄÖÑܧ¿abcdefghijklmnopqrstuvwxyzäöñüà'

assert GSM.decode(GSM.encode(text, with_padding=True), strip_padding=True) == text

I probably need to create a new version of the package where padding is enabled by default, to be in-line with the GSM specifications:

image