spotify / heroic

The Heroic Time Series Database
https://spotify.github.io/heroic/
Apache License 2.0
848 stars 109 forks source link

Investigate & resolve nondeterministic build errors #753

Open sming opened 3 years ago

sming commented 3 years ago

Design & Implementation Notes

feature/add-bigtable-timeout-settings-refactored com.spotify.heroic.GrpcClusterQueryIT > distributedFilterQueryTest FAILED
    java.lang.IllegalStateException: failed to create a child event loop
        Caused by:
        io.netty.channel.ChannelException: failed to open a new selector
            Caused by:
            java.io.IOException: Too many open files
    java.lang.NullPointerException
sming commented 3 years ago

Update

found that removing the 4x multiplier from this method :

        @Provides
        @GrpcRpcScope
        @Named("worker")
        fun worker() = NioEventLoopGroup(Runtime.getRuntime().availableProcessors() * 4)

stops the exception from being thrown. But there are still many questions unanswered:

  1. why only my and Sergey's machines
  2. why does commenting out a seemingly innocuous IT (testbasicWithNoDistribution) also stops exception from being thrown
  3. is the problem we're encountering pertinent to production operation of Heroic or is it a quirk of our Machines or just something that unit test code will exhibit
  4. what is the "correct" fix for this. Removing the 4x multiplier is a poor workaround at best.

(CC @malish8632)