The types for doc comments in our Protobuf code are strings,
which means the the contents must be valid UTF-8. However,
we were not doing validation before storing the contents.
This PR adds a validation step, and if the validation fails,
then we substitute invalid characters with the standard
unicode replacement character.
The utfcpp library was chosen as it is pretty decent at benchmarks
and has a very easy to use API for our purposes.
The types for doc comments in our Protobuf code are strings, which means the the contents must be valid UTF-8. However, we were not doing validation before storing the contents.
This PR adds a validation step, and if the validation fails, then we substitute invalid characters with the standard unicode replacement character.
The utfcpp library was chosen as it is pretty decent at benchmarks and has a very easy to use API for our purposes.