jaegertracing / jaeger

CNCF Jaeger, a Distributed Tracing Platform
https://www.jaegertracing.io/
Apache License 2.0
20.54k stars 2.44k forks source link

Improve unit test speed #6111

Open yurishkuro opened 1 month ago

yurishkuro commented 1 month ago

This blog post provides a good background / introduction: https://threedots.tech/post/go-test-parallelism/

When I ran the following command

GOMAXPROCS=1 go test -parallel 128 -p 16 -json ./... | go run github.com/roblaszczak/vgt@latest

I got the following output:

Image

It shows some tests running for a very long time (the darker blue colors), but even of some of them were sped up there are others (like in the first group in top left corner) that are still relatively long.

We also almost never use t.Parallel() in our test (right now I only found one instance), even though many of our tests are starting servers and making network calls, so could benefit from parallelism. But adding t.Parallel() should be done with care - some tests are starting servers on the same port (instead of ephemeral port) and must be refactored before adding parallelism.

codesmith25103 commented 1 month ago

Hey @yurishkuro. I wish to work on this issue. Although I am new to open source but I am eager to contribute in Jaeger. I have already setup code in my local machine

codesmith25103 commented 4 weeks ago

"Hey @yurishkuro, I added t.parallel() to the test function that was using ephemeral ports and got the following result."

Does that look right?

Image

mmorel-35 commented 2 weeks ago

I just found this https://golangci-lint.run/usage/linters/#paralleltest I hope it can help

yurishkuro commented 2 weeks ago

I'd be careful with that linter - not all tests benefit from parallel, it can often make things worse. I'd rather treat this as performance optimization problem - measure first, fix where it makes a difference.

vaidikcode commented 2 weeks ago

Hello @yurishkuro , Some observations: Usage of t.Parallel(): I've noticed that adding t.Parallel() generally only makes sense in tests with blocking operations, and even then, the impact on overall test speed can be minimal for some cases. Impact on Test Tables: The most significant improvement in test speed has been within test tables. I was able to reduce the test time for the crossdock/services package from 2.859 seconds to 2.760 seconds. Given these results, I’d like to know if you’d like me to continue refining this package or focus on specific areas. Additionally, do we have a target or expected test time reduction for each package? Knowing this will help me prioritize and make the best use of parallelism where it counts.

yurishkuro commented 2 weeks ago

In the issue description there was a link to a blog post that explains how to optimize total time by looking at the trace diagram. Going from 2.8s to 2.7s is not worth the effort.