Open jridderbusch opened 6 months ago
We have standard Angular validators for the form fields. They seem to be well tested and handle such symbols correctly.
I believe there may have been a misunderstanding here.
UTF-8's design theoretically allows code points to be represented in different ways. Overlong UTF-8 sequences use more bytes than strictly required, while still decoding to the same code point. For example, the ASCII space character ` (U+0020) is normally encoded as a single byte
0x20. However, following normal UTF-8 decoding rules, if you decode
0xc0 0xa0`, you will also get U+0020 back.^stackoverflow
The concern is that, if software operates directly on UTF-8-encoded strings, such encodings could potentially be used to bypass validation checks. In the above case of the space character, a validation that checks if a certain input does not contain whitespace may naively look only for the byte 0x20
, which can cause it to miss certain occurrences if input is not normalized beforehand.
Since this concerns input validation, I believe it is a backend issue, rather than (just) a frontend issue.
I believe there may have been a misunderstanding here.
UTF-8's design theoretically allows code points to be represented in different ways. Overlong UTF-8 sequences use more bytes than strictly required, while still decoding to the same code point. For example, the ASCII space character
` (U+0020) is normally encoded as a single byte
0x20. However, following normal UTF-8 decoding rules, if you decode
0xc0 0xa0`, you will also get U+0020 back.12The concern is that, if software operates directly on UTF-8-encoded strings, such encodings could potentially be used to bypass validation checks. In the above case of the space character, a validation that checks if a certain input does not contain whitespace may naively look only for the byte
0x20
, which can cause it to miss certain occurrences if input is not normalized beforehand.Since this concerns input validation, I believe it is a backend issue, rather than (just) a frontend issue.
Footnotes
thank you for the information!
Is this really an issue of our repo or shall it be addressed in Core EDC? //Cc @efiege
Both, since we would have to investigate the behavior of both upstream and our custom code.
Enhancement
Description
Investigate behavior when input contains overlong UTF-8 sequences (check if string validation can be bypassed; should be fine since Java converts all UTF-8 to UTF-16 before exposing it as strings, but not sure if JSON parser reads UTF-8 stream directly)
Stakeholders
@sybereal
Solution Proposal and Work Breakdown