jaegertracing / jaeger

CNCF Jaeger, a Distributed Tracing Platform
https://www.jaegertracing.io/
Apache License 2.0
20.58k stars 2.45k forks source link

Unit test to reproduce data race when reloading TLS config #6213

Open chahatsagarmain opened 1 week ago

chahatsagarmain commented 1 week ago

Which problem is this PR solving?

Description of the changes

How was this change tested?

Checklist

codecov[bot] commented 1 week ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 48.76%. Comparing base (2b7cf3a) to head (ee64cb4). Report is 11 commits behind head on main.

:exclamation: There is a different number of reports uploaded between BASE (2b7cf3a) and HEAD (ee64cb4). Click for more details.

HEAD has 1 upload less than BASE | Flag | BASE (2b7cf3a) | HEAD (ee64cb4) | |------|------|------| |unittests|1|0|
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #6213 +/- ## =========================================== - Coverage 96.50% 48.76% -47.75% =========================================== Files 354 179 -175 Lines 20127 10803 -9324 =========================================== - Hits 19424 5268 -14156 - Misses 520 5092 +4572 - Partials 183 443 +260 ``` | [Flag](https://app.codecov.io/gh/jaegertracing/jaeger/pull/6213/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=jaegertracing) | Coverage Δ | | |---|---|---| | [badger_v1](https://app.codecov.io/gh/jaegertracing/jaeger/pull/6213/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=jaegertracing) | `8.31% <ø> (ø)` | | | [badger_v2](https://app.codecov.io/gh/jaegertracing/jaeger/pull/6213/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=jaegertracing) | `1.67% <ø> (-0.01%)` | :arrow_down: | | [cassandra-4.x-v1](https://app.codecov.io/gh/jaegertracing/jaeger/pull/6213/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=jaegertracing) | `14.39% <ø> (ø)` | | | [cassandra-4.x-v2](https://app.codecov.io/gh/jaegertracing/jaeger/pull/6213/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=jaegertracing) | `1.61% <ø> (-0.01%)` | :arrow_down: | | [cassandra-5.x-v1](https://app.codecov.io/gh/jaegertracing/jaeger/pull/6213/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=jaegertracing) | `14.39% <ø> (ø)` | | | [cassandra-5.x-v2](https://app.codecov.io/gh/jaegertracing/jaeger/pull/6213/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=jaegertracing) | `1.61% <ø> (-0.01%)` | :arrow_down: | | [elasticsearch-6.x-v1](https://app.codecov.io/gh/jaegertracing/jaeger/pull/6213/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=jaegertracing) | `18.59% <ø> (ø)` | | | [elasticsearch-7.x-v1](https://app.codecov.io/gh/jaegertracing/jaeger/pull/6213/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=jaegertracing) | `18.68% <ø> (ø)` | | | [elasticsearch-8.x-v1](https://app.codecov.io/gh/jaegertracing/jaeger/pull/6213/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=jaegertracing) | `18.85% <ø> (ø)` | | | [elasticsearch-8.x-v2](https://app.codecov.io/gh/jaegertracing/jaeger/pull/6213/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=jaegertracing) | `1.66% <ø> (-0.02%)` | :arrow_down: | | [grpc_v1](https://app.codecov.io/gh/jaegertracing/jaeger/pull/6213/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=jaegertracing) | `9.44% <ø> (-0.04%)` | :arrow_down: | | [grpc_v2](https://app.codecov.io/gh/jaegertracing/jaeger/pull/6213/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=jaegertracing) | `6.97% <ø> (-0.04%)` | :arrow_down: | | [kafka-v1](https://app.codecov.io/gh/jaegertracing/jaeger/pull/6213/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=jaegertracing) | `8.88% <ø> (ø)` | | | [kafka-v2](https://app.codecov.io/gh/jaegertracing/jaeger/pull/6213/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=jaegertracing) | `1.67% <ø> (-0.01%)` | :arrow_down: | | [memory_v2](https://app.codecov.io/gh/jaegertracing/jaeger/pull/6213/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=jaegertracing) | `1.67% <ø> (+<0.01%)` | :arrow_up: | | [opensearch-1.x-v1](https://app.codecov.io/gh/jaegertracing/jaeger/pull/6213/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=jaegertracing) | `18.73% <ø> (ø)` | | | [opensearch-2.x-v1](https://app.codecov.io/gh/jaegertracing/jaeger/pull/6213/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=jaegertracing) | `18.72% <ø> (-0.01%)` | :arrow_down: | | [opensearch-2.x-v2](https://app.codecov.io/gh/jaegertracing/jaeger/pull/6213/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=jaegertracing) | `1.67% <ø> (-0.01%)` | :arrow_down: | | [tailsampling-processor](https://app.codecov.io/gh/jaegertracing/jaeger/pull/6213/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=jaegertracing) | `0.46% <ø> (-0.01%)` | :arrow_down: | | [unittests](https://app.codecov.io/gh/jaegertracing/jaeger/pull/6213/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=jaegertracing) | `?` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=jaegertracing#carryforward-flags-in-the-pull-request-comment) to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.


🚨 Try these New Features:

yurishkuro commented 1 week ago

Have you observe this new test fail? It doesn't look like it's testing race condition deterministically.

chahatsagarmain commented 1 week ago

Have you observe this new test fail? It doesn't look like it's testing race condition deterministically.

In the previous test, the race condition occurred because the mutex was not used. Adding the mutex resolved the issue. However, in the updated test, new TLS configurations are being applied and its seems to be causing data race during client connection and TLS handshake error .

yurishkuro commented 1 week ago

so I see this test is failing on detected race condition, but the stack traces indicate that the race is caused by the test itself (the Write part), not by the production code:

WARNING: DATA RACE
Write at 0x00c000138400 by goroutine 163:
  github.com/jaegertracing/jaeger/pkg/config/tlscfg.TestCertificateRaceCondition.func4()
      /home/runner/work/jaeger/jaeger/pkg/config/tlscfg/options_test.go:392 +0xdc

Previous read at 0x00c000[138](https://github.com/jaegertracing/jaeger/actions/runs/11900851926/job/33162577763?pr=6213#step:8:139)400 by goroutine 162:
  github.com/jaegertracing/jaeger/pkg/config/tlscfg.(*Options).Config()
      /home/runner/work/jaeger/jaeger/pkg/config/tlscfg/options.go:75 +0x676
yurishkuro commented 5 days ago

@chahatsagarmain an alternative to fixing the race condition from reloading is to eliminate the use of this package altogether. We already migrated several endpoints to use OTEL helpers which internally handle TLS reloading differently (on a timeout rather than on file change). It would be interesting to see which parts of Jaeger still use the tlscfg package and switch to OTEL helpers.

chahatsagarmain commented 4 days ago

@yurishkuro So i can use configtls from OTEL and replace the usage of the tlscfg package ? Also , there is usage of tlscfg in collector and agent and mostly test files .

yurishkuro commented 4 days ago

yes, that would be good. You can start small, e.g. can we remove tlscfg dependency from cmd/es-rollover?