open-telemetry / opentelemetry-cpp

The OpenTelemetry C++ Client
https://opentelemetry.io/
Apache License 2.0
811 stars 391 forks source link

[Code health] Remove Unicode Text from Source files #2707

Closed perhapsmaple closed 1 week ago

perhapsmaple commented 1 week ago

Fixes #2706

Changes

Removed Unicode characters from API headers and opentracing-shim/src/span_shim.cc.

Escaped UTF-8 characters from test file data:

./ext/test/http/url_parser_test.cc: c program text, Unicode text, UTF-8 text
./sdk/test/metrics/instrument_metadata_validator_test.cc: c program text, Unicode text, UTF-8 text

For significant contributions please make sure you have completed the following items:

marcalff commented 1 week ago

For UTF-8 character still present, like in:

./ext/test/http/url_parser_test.cc: c program text, Unicode text, UTF-8 text

The offending source code is:

      {"%C3%B6%C3%A0%C2%A7%C3%96abcd%C3%84", "öà§ÖabcdÄ"},

Instead of using the character ö in the expected string, which corresponds to C3B6:

U+00F6  ö   c3 b6   LATIN SMALL LETTER O WITH DIAERESIS

could the code be changed to use unicode escape sequences like \u00F6 instead, so all the code is UTF-8 clean.

codecov[bot] commented 1 week ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 87.67%. Comparing base (497eaf4) to head (0a8dc8d). Report is 86 commits behind head on main.

Additional details and impacted files [![Impacted file tree graph](https://app.codecov.io/gh/open-telemetry/opentelemetry-cpp/pull/2707/graphs/tree.svg?width=650&height=150&src=pr&token=FJESTYQ2AD&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=open-telemetry)](https://app.codecov.io/gh/open-telemetry/opentelemetry-cpp/pull/2707?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=open-telemetry) ```diff @@ Coverage Diff @@ ## main #2707 +/- ## ========================================== + Coverage 87.12% 87.67% +0.55% ========================================== Files 200 190 -10 Lines 6109 5853 -256 ========================================== - Hits 5322 5131 -191 + Misses 787 722 -65 ``` | [Files](https://app.codecov.io/gh/open-telemetry/opentelemetry-cpp/pull/2707?dropdown=coverage&src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=open-telemetry) | Coverage Δ | | |---|---|---| | [api/include/opentelemetry/baggage/baggage.h](https://app.codecov.io/gh/open-telemetry/opentelemetry-cpp/pull/2707?src=pr&el=tree&filepath=api%2Finclude%2Fopentelemetry%2Fbaggage%2Fbaggage.h&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=open-telemetry#diff-YXBpL2luY2x1ZGUvb3BlbnRlbGVtZXRyeS9iYWdnYWdlL2JhZ2dhZ2UuaA==) | `97.30% <ø> (ø)` | | | [api/include/opentelemetry/context/context.h](https://app.codecov.io/gh/open-telemetry/opentelemetry-cpp/pull/2707?src=pr&el=tree&filepath=api%2Finclude%2Fopentelemetry%2Fcontext%2Fcontext.h&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=open-telemetry#diff-YXBpL2luY2x1ZGUvb3BlbnRlbGVtZXRyeS9jb250ZXh0L2NvbnRleHQuaA==) | `100.00% <ø> (ø)` | | | [...pi/include/opentelemetry/context/runtime\_context.h](https://app.codecov.io/gh/open-telemetry/opentelemetry-cpp/pull/2707?src=pr&el=tree&filepath=api%2Finclude%2Fopentelemetry%2Fcontext%2Fruntime_context.h&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=open-telemetry#diff-YXBpL2luY2x1ZGUvb3BlbnRlbGVtZXRyeS9jb250ZXh0L3J1bnRpbWVfY29udGV4dC5o) | `97.50% <ø> (-0.09%)` | :arrow_down: | ... and [105 files with indirect coverage changes](https://app.codecov.io/gh/open-telemetry/opentelemetry-cpp/pull/2707/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=open-telemetry)
marcalff commented 1 week ago

Instead of using the character ö in the expected string, which corresponds to C3B6:

U+00F6    ö   c3 b6   LATIN SMALL LETTER O WITH DIAERESIS

could the code be changed to use unicode escape sequences like \u00F6 instead, so all the code is UTF-8 clean.

... and of course my suggestion does not work for windows / msvc for some reason (CI failures), sorry about that.

From what I could find, windows may need a std::wstring and a L"\u00F6" literal, but since no code in opentelemetry-cpp uses wstring currently, this is likely to open a can of worms.

A possible work around is to code the UTF-8 data instead : "\xC3\xB6", please try.

If this still fails, don't waste time on it, and remove changes from the two test files, these will be fixed separately.

Thanks