raphlinus / font-rs

Apache License 2.0
753 stars 49 forks source link

Fix overflow bug with code points larger than 2^15 #31

Closed golddranks closed 5 years ago

golddranks commented 5 years ago

Hi, I encountered a bug where it crashes because of an overflow with code points larger than 2^15 on conversion from u16 to i16. Big codepoints are pretty common with Japanese Kanji so it's a serious bug.

I'm not aware of the filetype details, but generally Unicode codepoints are 21 bits width, so it strikes me odd to use u16 for those, but maybe it's a limitation of TrueType? If it isn't and codepoints outside of the basic multilingual plane aren't just supported yet, I can try and help to refactor the code base.

golddranks commented 5 years ago

I was considering implementing the fix with a wrapping add, but there's a problem with that: code_point is a u16 and id_delta is an i16, and wrapping_add in the stdlib isn't defined between those. In this corner case, the theoretical range of either is not superset of the other (code_point might be over 2^15-1 and id_delta might be negative) so one can't safely convert without widening.

golddranks commented 5 years ago

I'll see if I can send some PR's to extend the code base since I might be needing that functionality.

golddranks commented 5 years ago

(Btw. I signed the Google Individual Contributor License Agreement.)

raphlinus commented 5 years ago

Inside EncodingFormat4, code_point is limited to 16 bit values. I believe this computation might wrap and that's the behavior specified by the standard. Thus, I think keeping a 16 bit type here is appropriate.

When EncodingFormat12 is implemented, its representation of code point should definitely be a 32 bit value.

I'm happy to answer questions. Also I know @rtsuk has looked at this, I believe he's interested in correct rendering of (b/w) emoji.

Lastly, the Google CLA is not needed anymore, the ownership of this repo has now passed to me. I should probably update the READMEs and so on. But thanks!

golddranks commented 5 years ago

So, I don't think there is a "wrapping conversion" of i16 to u16 or vice versa in stdlib, so I was forced to define one by a transmute. I feel like this is an omission in the stdlib, and it feels bad to introduce unsafe code for a thing like this. What do you think about the newest commit?

raphlinus commented 5 years ago

Are you sure the as panics? I just checked in the playground and it doesn't. The nomicon says it's a "no-op", which is maddeningly vague; I think they mean the bit representation doesn't change, which is what I expect.

There's another way to write this which is maybe more explicit about the intent: ((code_point as i32) + (id_delta as i32) & 0xffff) as u16. But I think just wrapping_add should work.

golddranks commented 5 years ago

Huh, I thought it would panic if it gets out of range in as conversion. Apparently it doesn't!

golddranks commented 5 years ago

Do you want me to squash?

golddranks commented 5 years ago

And sorry about wasting your time, my mental model how as works was wrong; I thought it would use the same logic as the other operations. (Panic on overflow in debug mode.)

raphlinus commented 5 years ago

No worries, it's good to sweat the details sometimes. I'll squash on merge as soon as CI is green. Thanks!

rtsuk commented 5 years ago

I'm pretty sure this was my error. Thanks for fixing it!