fix: Replace invalid UTF-8 characters in doc comments

The types for doc comments in our Protobuf code are strings, which means the the contents must be valid UTF-8. However, we were not doing validation before storing the contents.

This PR adds a validation step, and if the validation fails, then we substitute invalid characters with the standard unicode replacement character.

The utfcpp library was chosen as it is pretty decent at benchmarks and has a very easy to use API for our purposes.

sourcegraph / scip-clang

fix: Replace invalid UTF-8 characters in doc comments #453