oracle / opengrok

OpenGrok is a fast and usable source code search and cross reference engine, written in Java
http://oracle.github.io/opengrok/
Other
4.36k stars 748 forks source link

testComplexSuggestions2, testGetPopularityDataAll sometimes fails #2704

Open vladak opened 5 years ago

vladak commented 5 years ago
[ERROR]   SuggesterControllerTest.testComplexSuggestions2:407 
Expected: iterable over ["main", "mutablelistof"] in any order
     but: No item matches: "main", "mutablelistof" in []
vladak commented 5 years ago

like in https://travis-ci.org/oracle/opengrok/jobs/499311368 but I have also seen the failures in AppVeyor builds.

ahornace commented 5 years ago

From my experience it fails because of this race condition which is bugging us for a long time: https://github.com/oracle/opengrok/issues/2030

The job you provided is still running and I see no logs :/

vladak commented 5 years ago

Sorry, just restarted the build because this was a release build.

Dne st 27. 2. 2019 17:01 uživatel Adam Hornáček notifications@github.com napsal:

From my experience it fails because of this race condition which is bugging us for a long time: #2030 https://github.com/oracle/opengrok/issues/2030

The link you provided is still running and I see no logs :/

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/oracle/opengrok/issues/2704#issuecomment-467920134, or mute the thread https://github.com/notifications/unsubscribe-auth/ACzGDGslvbm4dxSosDInUKTR1Lt2TtaTks5vRqvCgaJpZM4bU2fm .

vladak commented 4 years ago

Getting more failure in SuggesterControllerTest on Windows:

vladak commented 4 years ago

The before method of the test has timeout set to 15 seconds:

          await().atMost(15, TimeUnit.SECONDS).until(() ->
                  getSuggesterProjectDataSize() == env.getProjectList().size());

which makes me wonder whether this is enough and how/if Awaitility provides any feedback/log if the timeout expired and the condition is not met.

vladak commented 4 years ago

Also, for both the failed cases listed above the setup of the test reported:

Jan 16, 2020 9:09:10 AM org.opengrok.suggest.SuggesterProjectData initSearchCountMap
SEVERE: Could not create ChronicleMap, most popular completion disabled, if you are using JDK9+ make sure to specify: --add-exports java.base/jdk.internal.ref=ALL-UNNAMED --add-exports java.base/jdk.internal.misc=ALL-UNNAMED --add-exports java.base/sun.nio.ch=ALL-UNNAMED
java.lang.IllegalArgumentException: Average key size must be a positive, finite number
    at net.openhft.chronicle.map.ChronicleMapBuilder.checkAverageSize(ChronicleMapBuilder.java:314)
    at net.openhft.chronicle.map.ChronicleMapBuilder.averageKeySize(ChronicleMapBuilder.java:531)
    at org.opengrok.suggest.popular.impl.chronicle.ChronicleMapAdapter.<init>(ChronicleMapAdapter.java:52)
    at org.opengrok.suggest.SuggesterProjectData.initSearchCountMap(SuggesterProjectData.java:300)
    at org.opengrok.suggest.SuggesterProjectData.init(SuggesterProjectData.java:158)
    at org.opengrok.suggest.Suggester.lambda$getInitRunnable$1(Suggester.java:185)
    at java.util.concurrent.ForkJoinTask$AdaptedRunnableAction.exec(ForkJoinTask.java:1386)
    at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
    at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
    at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
    at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)

For completeness, this was on Windows with OpenJDK 8.

vladak commented 4 years ago

w.r.t. suggester timeouts: both failed cases listed above had:

INFO: Suggester successfully initialized (took 4 seconds)

so it seems the 15 seconds is enough.

vladak commented 4 years ago

The question is whether the failure to initialize Chronicle map can lead to the observed test case failures, i.e. if there is a implicit dependency. Given that the failed test cases do test popularity, this seems plausible.

vladak commented 4 years ago

added a catch for IllegalArgumentException in initSearchCountMap() and for the failed case it reported the value of conf.getAverageKeySize() like so:

SEVERE: Could not create ChronicleMap due to invalid key size: NaN
vladak commented 4 years ago

Added a "debug" print to save() and it verifies @ahornace's hypothesis that the data in the map.cfg file already contains the NaN:

Mar 06, 2020 4:02:37 PM org.opengrok.suggest.popular.impl.chronicle.ChronicleMapConfiguration save
INFO: Saving chronicle map configuration in directory C:\Users\travis\AppData\Local\Temp\data9188628379176297630\suggester\json for field 'refs': entries: 0 , average key size: NaN
Mar 06, 2020 4:02:37 PM org.opengrok.suggest.SuggesterProjectData initSearchCountMap
SEVERE: Could not create ChronicleMap from directory C:\Users\travis\AppData\Local\Temp\data9188628379176297630\suggester\json for field refs due to invalid key size: entries: 0 , average key size: NaN
java.lang.IllegalArgumentException: Average key size must be a positive, finite number
    at net.openhft.chronicle.map.ChronicleMapBuilder.checkAverageSize(ChronicleMapBuilder.java:314)
    at net.openhft.chronicle.map.ChronicleMapBuilder.averageKeySize(ChronicleMapBuilder.java:531)
    at org.opengrok.suggest.popular.impl.chronicle.ChronicleMapAdapter.<init>(ChronicleMapAdapter.java:52)
    at org.opengrok.suggest.SuggesterProjectData.initSearchCountMap(SuggesterProjectData.java:300)
    at org.opengrok.suggest.SuggesterProjectData.init(SuggesterProjectData.java:158)
    at org.opengrok.suggest.Suggester.lambda$getInitRunnable$1(Suggester.java:185)
    at java.util.concurrent.ForkJoinTask$AdaptedRunnableAction.exec(ForkJoinTask.java:1386)
    at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
    at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056
    at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
    at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)
vladak commented 4 years ago

Both numbers on the right side of the expression here https://github.com/oracle/opengrok/blob/1a0db5e6d30cea2cbce9fec33d2acfd09e64a492/suggester/src/main/java/org/opengrok/suggest/SuggesterProjectData.java#L264 are 0 for the failed case, thus giving NaN for the double value (that will be represented as Double in the averageLengths map).

vladak commented 4 years ago

The workaround could be not to put the entry to the averageLengths map if the numeric value is NaN. The getAverageLength() used in initSearchCountMap() will return default (non zero) value in such case.

vladak commented 4 years ago

Interestingly the setup/tests seems to fail only when failing to build average lengths map for field refs for the testdata/sources/json directory.

vladak commented 4 years ago

Actually, the ChronicleMapAdapter() init actually fails on Linux builds too, it just does not lead to test failure. Either the init failure is really just a red herring or is interpreted differently on Windows builds. Anyhow, this should be fixed as suggester above - if there is no index data for particular field it should be ignored w.r.t. averge key lengths.

vladak commented 4 years ago

The popularity tests fail because the term count cannot be updated using SuggesterServiceImpl.getInstance().increaseSearchCount() due to rebuild being in progress.

vladak commented 4 years ago

It seems that the trouble is that SuggesterControllerTest#before() waits merely for suggester to be initialized (Suggester#projectData to be filled) while lots of the test cases are sensitive to suggester rebuild (rather than init).

In one of the failed cases the log goes like this:

Mar 23, 2020 4:11:25 PM org.opengrok.suggest.Suggester shutdownAndAwaitTermination
INFO: Suggester successfully initialized (took 5 seconds)
Mar 23, 2020 4:11:25 PM org.opengrok.web.api.v1.suggester.provider.service.impl.SuggesterServiceImpl scheduleRebuild
INFO: Suggester rebuild not scheduled
Mar 23, 2020 4:11:26 PM org.opengrok.suggest.Suggester rebuild
INFO: Rebuilding the following suggesters: [bug16465, archives, data, scala, clojure, document, pascal, bug19195, issue10, sql, rust, csharp, java, fortran, json, perl, tcl, swift, haskell, python, c, kotlin, lisp, rfe2575, bug16070, javascript, script, bug3658, php, scopes, bug14924, bug3430]
Mar 23, 2020 4:11:26 PM org.glassfish.grizzly.http.server.NetworkListener shutdownNow
INFO: Stopped listener bound to [localhost:60342]
Mar 23, 2020 4:11:26 PM org.glassfish.jersey.test.grizzly.GrizzlyTestContainerFactory$GrizzlyTestContainer <init>
INFO: Creating GrizzlyTestContainer configured at the base URI http://localhost:62271/
Mar 23, 2020 4:11:26 PM org.glassfish.grizzly.http.server.NetworkListener start
INFO: Started listener bound to [localhost:62271]
Mar 23, 2020 4:11:26 PM org.glassfish.grizzly.http.server.HttpServer start
INFO: [HttpServer-1] Started.
Mar 23, 2020 4:11:26 PM org.glassfish.grizzly.http.server.NetworkListener shutdownNow
INFO: Stopped listener bound to [localhost:62271]
Mar 23, 2020 4:11:27 PM org.glassfish.jersey.test.grizzly.GrizzlyTestContainerFactory$GrizzlyTestContainer <init>
INFO: Creating GrizzlyTestContainer configured at the base URI http://localhost:57989/
Mar 23, 2020 4:11:27 PM org.glassfish.grizzly.http.server.NetworkListener start
INFO: Started listener bound to [localhost:57989]
Mar 23, 2020 4:11:27 PM org.glassfish.grizzly.http.server.HttpServer start
INFO: [HttpServer-2] Started.
Mar 23, 2020 4:11:27 PM org.glassfish.grizzly.http.server.NetworkListener shutdownNow
INFO: Stopped listener bound to [localhost:57989]
Mar 23, 2020 4:11:27 PM org.glassfish.jersey.test.grizzly.GrizzlyTestContainerFactory$GrizzlyTestContainer <init>
INFO: Creating GrizzlyTestContainer configured at the base URI http://localhost:56937/
Mar 23, 2020 4:11:28 PM org.glassfish.grizzly.http.server.NetworkListener start
INFO: Started listener bound to [localhost:56937]
Mar 23, 2020 4:11:28 PM org.glassfish.grizzly.http.server.HttpServer start
INFO: [HttpServer-3] Started.
Mar 23, 2020 4:11:28 PM org.glassfish.grizzly.http.server.NetworkListener shutdownNow
INFO: Stopped listener bound to [localhost:56937]
Mar 23, 2020 4:11:28 PM org.glassfish.jersey.test.grizzly.GrizzlyTestContainerFactory$GrizzlyTestContainer <init>
INFO: Creating GrizzlyTestContainer configured at the base URI http://localhost:54678/
Mar 23, 2020 4:11:28 PM org.glassfish.grizzly.http.server.NetworkListener start
INFO: Started listener bound to [localhost:54678]
Mar 23, 2020 4:11:28 PM org.glassfish.grizzly.http.server.HttpServer start
INFO: [HttpServer-4] Started.
[ERROR] Tests run: 37, Failures: 1, Errors: 0, Skipped: 1, Time elapsed: 77.04 s <<< FAILURE! - in org.opengrok.web.api.v1.controller.SuggesterControllerTest
[ERROR] testGetSuggestionsSimplePath  Time elapsed: 0.297 s  <<< FAILURE!
java.lang.AssertionError: 
Expected: iterable containing ["header"]
     but: No item matched: "header"
    at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
    at org.junit.Assert.assertThat(Assert.java:956)
    at org.junit.Assert.assertThat(Assert.java:923)
    at org.opengrok.web.api.v1.controller.SuggesterControllerTest.testGetSuggestionsSimplePath(SuggesterControllerTest.java:220)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.ja
        ...

... lots of messages about Grizzly HTTP server start and shutdown

Mar 23, 2020 4:11:35 PM org.opengrok.suggest.Suggester shutdownAndAwaitTermination
INFO: Suggesters for [bug16465, archives, data, scala, clojure, document, pascal, bug19195, issue10, sql, rust, csharp, java, fortran, json, perl, tcl, swift, haskell, python, c, kotlin, lisp, rfe2575, bug16070, javascript, script, bug3658, php, scopes, bug14924, bug3430] were successfully rebuilt (took 9 seconds)

We can see this sequence of events:

vladak commented 4 years ago

When TimeUnit.SECONDS.sleep(15); was added to @BeforeClass the failures can no longer be seen.

vladak commented 4 years ago

I found out that SuggesterControllerTest enforces the order of test case execution by using @FixMethodOrder(MethodSorters.NAME_ASCENDING). There are 2 test cases that trigger scheduler rebuild - ZtestRebuild (rebuilds all projects, takes bunch of seconds), ZtestRebuildProject (rebuilds just the c project, nearly instantaneous).

When the SuggesterControllerTest is run in Travis or in IDEA locally on my Linux workstation (with en_US.UTF-8 locale) the Z tests actually get run first. I commented out the Z tests and the tests still pass. So, not sure why this ordering is in place, @ahornace ?

Anyhow, either ordering is not good - if the Z tests get run first they might interfere with the tests in SuggesterControllerTest (as demonstrated in this issue), if they get to run last they might interfere with tests in another test class since the suggester rebuild thread is asynchronous.