Cannot process non-latin OTR encrypted messages from Pidgin/Psi+

aelmahmoudy commented 9 years ago

Since Xabber switched to otr4j/otr4j implementation, it is not able to receive non-latin messages that are OTR encrypted from a Pidgin or Psi+ client, as reported by a user on https://github.com/redsolution/xabber-android/issues/385.

logcat gives the following (full logcat can be found on redsolution/xabber-android#385) :

I/System.out(12125): 11:51:41 AM RCV  (1111072848): <message id='purpleeee36e65' type='chat' to='xabberuser@server.com' from='pidginuser@server.com/1c3e9c54-2c2e-4575-994b-c9bbe19b9d88'><active xmlns='http://jabber.org/protocol/chatstates'/><body>?OTR:AAMDOdT/5wuLBjAAAAAAAQAAAAEAAADA65s1wprxbfsXHCBilc/OuQaoAlsRjQJGyVJn/qA4pgLIxiiymv6pz8YyBEaeLOtbHsW9pzWhBGfXafHp+JDuDVutduCI0K7TGRlYUL/WIYI7qRVuLCRwrWAJwH8hkN8YbxUoQdbUjITrNt2gyp+yDkDdHVTZlrF83LmxZhEfXyQ7HDvUEK54KCzRZTjc7p7xuqQzbYt67ZCVSsB72WWsTOqOtJBDJub+8aa/YoErMktbmxmdHrCOPJvKcRn7TfD4AAAAAAAAAAUAAABCQ0CityM6aunUNBAZoaookgP9st+J6lZCx14dUO8h7JpPLTCYM0Bqk40ptRywv8Bp75J2zfGmfRt5zzkpgevP+J50C7tfSVUIhCnIVKn6IYNSCucLKEQAAAAA.</body></message>
W/System.err(12125): xabberuser@server.com/androidrUmSh1VT:pidginuser@server.com
W/System.err(12125): net.java.otr4j.OtrException: java.io.IOException: Unable to read the required amount of bytes from the stream. Expected were 16718 bytes but I could only read 2 bytes.
W/System.err(12125):    at net.java.otr4j.session.Session.handleDataMessage(Session.java:643)
W/System.err(12125):    at net.java.otr4j.session.Session.transformReceiving(Session.java:455)
W/System.err(12125):    at com.xabber.android.data.extension.otr.OTRManager.transformReceiving(OTRManager.java:456)
W/System.err(12125):    at com.xabber.android.data.message.RegularChat.onPacket(RegularChat.java:144)
W/System.err(12125):    at com.xabber.android.data.message.MessageManager.onPacket(MessageManager.java:449)

eighthave commented 9 years ago

Oh interesting, I recently wrote unicode tests for the SMP question/answer and found bugs there. But there isn't a similar test for the message body. This is in the message body itself? Can you provide examples of how to reproduce it? Then we can add it to the test suite.

We've gotten reports of issues between ChatSecure and Pidgin with Cyrillic text.

eighthave commented 9 years ago

After thinking about it a bit, that crash report makes me think that there are nulls (0x00) in the byte stream. Null bytes are the separators in OTR packets, so if there is a null in the message, the OTR packet won't be parsed properly. With the SMP stuff, I added code that removes nulls on input. Sounds like something similar needs to happen here.

aelmahmoudy commented 9 years ago

Well, you can reproduce it by making one Pidgin user chat with a Xabber user using OTR encryption, and the Pidgin user would write a non-latin message, for example: "أهلا وسهلا"

aelmahmoudy commented 9 years ago

And yes, this is in the message body itself.

aelmahmoudy commented 9 years ago

This issue also happens with messages from bitlbee client

eighthave commented 9 years ago

This seems to be an issue of libotr versus otr4j. I was able to reproduce something like this problem with Pidgin and ChatSecure. So I wrote some tests in otr4j with lots of Unicode, but they all pass for both encrypted and plain text messages with otr4j. So the next step is getting the actual OTR packets that Pidgin/libotr is sending. Here are the test strings I used:

    public static String[] unicodes = {
            "plainAscii",
            "",
            "བོད་རིགས་ཀྱི་བོད་སྐད་བརྗོད་པ་དང་ བོད་རིགས་མང་ཆེ་བ་ནི་ནང་ཆོས་བྱེད་པ་དང་",
            "تبتی قوم (Tibetan people)",
            "Учените твърдят, че тибетците нямат",
            "Câung-cŭk (藏族, Câung-ngṳ̄: བོད་པ་)",
            "チベット系民族（チベットけいみんぞく）",
            "原始汉人与原始藏缅人约在公元前4000年左右分开。",
            "Տիբեթացիներ (ինքնանվանումը՝ պյոբա),",
            "... Gezginci olarak",
            "شْتَن Xotan",
            "Tibeťané jsou",
            "ئاچاڭ- تىبەت مىللىتى",
            "Miscellaneous Symbols and Pictographs[1][2] Official Unicode Consortium code chart (PDF)",
            "Royal Thai (ราชาศัพท์)",
            "טיילאנדיש123 (ภาษาไทย)",
            "ជើងអក្សរ cheung âksâr",
            "중화인민공화국에서는 기본적으로 한족은 ",
            "पाठ्यांशः अत्र उपलभ्यतेसर्जनसामान्यलक्षणम्/Share-",
            "திபெத்துக்கு வெகள்",
            "អក្សរសាស្រ្តខែ្មរមានប្រវ៌ត្តជាងពីរពាន់ឆ្នាំមកហើយ ",
    };

otr4j / otr4j-issues

Cannot process non-latin OTR encrypted messages from Pidgin/Psi+ #17