utf8makevalid : test to identify sequence length and possible values not sufficient

sheredom / utf8.h

📚 single header utf8 string functions for C and C++

The Unlicense

1.73k stars 125 forks source link

utf8makevalid : test to identify sequence length and possible values not sufficient #118

Open JPDelprat opened 9 months ago

JPDelprat commented 9 months ago

Hello,

In utf8makevalid, you use the following test to identify a 4 sequence bytes

"if (0xf0 == (0xf8 & *read))"

This is not correct if you suppose that you can have any invalid string as an input parameter, since only a few values in f0-ff ranges are valid.

Moreover, for valid values in f0-ff ranges, possible values for second byte are not the same one. For example, with f0, valid range for second byte is 90..bf, instead of 80..bf

Regards

sheredom commented 9 months ago

I'd happily accept a PR that tightened this up with the supporting testing!