Closed eighthave closed 4 years ago
Hmmm... that sounds reasonable. I'll have a look at this soon. I'm afraid it is possible that it is something as trivial as this, since I haven't had an opportunity of unicode + fragmentation. Will follow up soon.
Also, note that OtrFragmenter only handles outgoing messages. Inbound messages are processed by OtrAssember IIRC.
I looked a bit more into this, and currently, OtrFragmenter
and OtrAssembler
are only ever used on OTR-encoded text, so that'll always be base64, which is definitely only ASCII. So working with String.length()
is fine. But I'll let @cobratbq confirm that and close this issue.
Confirmed. Now that I think about it, I've asked myself this question earlier, and came to the same conclusion then. I think, in time we might be better off refactoring otr4j to handle encoded messages as byte[] instead of String.
I've been diving deep into Unicode issues, so that means mostly finding code where the assumption is that
String.length()
will be equal tobyte[].length
. But since UTF-8 encoding encodes characters in a variable number of bytes, that assumption is totally invalid with non-ASCII text.Maybe I'm not understanding the
OtrFragmenter
code correctly, but it looks likenumberOfFragments()
,computeFragmentNumber()
, andfragment()
all suffer from this issue.For some examples, see otr4j-temp/otr4j-issues#17 otr4j-temp/otr4j#47 otr4j-temp/otr4j#48. @devrandom maybe you have some comment on this issue also, since you've written fragmenting code for otr4j.