OtrFragmenter uses String lengths, rather than byte[] lengths

eighthave commented 9 years ago

I've been diving deep into Unicode issues, so that means mostly finding code where the assumption is that String.length() will be equal to byte[].length. But since UTF-8 encoding encodes characters in a variable number of bytes, that assumption is totally invalid with non-ASCII text.

Maybe I'm not understanding the OtrFragmenter code correctly, but it looks like numberOfFragments(), computeFragmentNumber(), and fragment() all suffer from this issue.

For some examples, see otr4j-temp/otr4j-issues#17 otr4j-temp/otr4j#47 otr4j-temp/otr4j#48. @devrandom maybe you have some comment on this issue also, since you've written fragmenting code for otr4j.

cobratbq commented 9 years ago

Hmmm... that sounds reasonable. I'll have a look at this soon. I'm afraid it is possible that it is something as trivial as this, since I haven't had an opportunity of unicode + fragmentation. Will follow up soon.

cobratbq commented 9 years ago

Also, note that OtrFragmenter only handles outgoing messages. Inbound messages are processed by OtrAssember IIRC.

eighthave commented 9 years ago

I looked a bit more into this, and currently, OtrFragmenter and OtrAssembler are only ever used on OTR-encoded text, so that'll always be base64, which is definitely only ASCII. So working with String.length() is fine. But I'll let @cobratbq confirm that and close this issue.

cobratbq commented 9 years ago

Confirmed. Now that I think about it, I've asked myself this question earlier, and came to the same conclusion then. I think, in time we might be better off refactoring otr4j to handle encoded messages as byte[] instead of String.

otr4j / otr4j-issues

OtrFragmenter uses String lengths, rather than byte[] lengths #18