Larger tests become flaky

Nereboss commented 4 months ago

Describe the bug

Tests for a function that calls multiple extension methods become flaky.

In our application we have multiple parsers that each have a dialog which consists of multiple questions. All questions are tested separately from another and no test there has ever been flaky.

Now the dialog of a parser is one function that calls multiple questions, a bit like this:

Dialog { result1 = question1() result2 = question1() return result 1 + result 2 }

However, the tests for that dialog function become flaky and it looks like the terminal is somehow "swallowing keys" with the starting keys of the user input missing.

To Reproduce

I wrote a minimal example to make this issue reproducible:

We have extension functions for the different questions like this

fun Session.question1(callback: suspend RunScope.() -> Unit): String {
    var userInput = "-1"
    section {
        textLine("render some text and then display the input:")
        input()
    }.runUntilSignal {
        onInputEntered {
            signal()
            userInput = input
        }
        callback()
    }
    return userInput
}

Tests for this function alone are not flaky, here is a test to try it out:

@Test
    fun `does question1 ever become flaky`() {
        val testString = "Long input that should test if the length has any effect on anything"
        var result: String
        testSession { terminal ->
            result = question1(callback = {
                terminal.type(testString)
                terminal.press(Keys.ENTER)
            })
            assertThat(result).isEqualTo(testString)
        }
    }

Now we build a dialog from multiple questions with a function like this (the code of question2 is identical to the above shown question1):

fun Session.dialog(
    callback: suspend RunScope.() -> Unit = {},
): String {
    val result1: String = question1(callback)
    val result2: String = question2(callback)

    return "$result1|||$result2"
}

Tests for this dialog function are flaky with the start of the simulated input being cut off (more often for the 2nd question than the first but it does happen for the first as well). Here is such a test to try it out:

@Test
fun `test if calling the function that calls fun1 from here becomes flaky`() {
    val testString = "Long input that should test if the length has any effect on anything"
    var result: String
    testSession { terminal ->
        result = dialog(callback = {
            terminal.type(testString)
            terminal.press(Keys.ENTER)
        })
        assertThat(result).isEqualTo("$testString|||$testString")
    }
}

It seems like the tests fail more often when the CPU is busy. They also fail more on weaker hardware. To make it more easily reproducible, we pasted the test for the dialog 6 times into a testclass. That way it most of the time fails within the first 10 executions.

Expected behavior The tests are not flaky.

Screenshots Example of the first 2 letters of the input for the 2nd question being "swallowed"

Desktop (please complete the following information):

OS: MacOS (also tested on Windows with the same result)
Version 1.1.3-SNAPSHOT

Additional context As a testing environment we used Junit5, Gradle8 and Kotter version 1.9.23. Annotating the tests with @Execution(ExecutionMode.SAME_THREAD) also did not influence the result.

Nereboss commented 4 months ago

@bitspittle friendly poke in case your email settings are still swallowing issues ^^

bitspittle commented 4 months ago

Thanks for the poke! I did indeed miss it, and you are welcome to poke like this anytime I don't respond within 48 hours.

I probably won't be able to take a close look at this until after my talk on June 7 (well and a few days after, which I'll be spending with friends in the Bay Area), but my intention is to prioritize after that. Worst case there's actually a threading issue in core Kotter itself, which would be worth understanding.

I do appreciate you bringing this up; I won't push out 1.1.3 until we either fix this or understand it better.

bitspittle commented 3 months ago

Just an FYI, finally back from my talk and getting back into working on stuff again. Will probably be busy today and tomorrow but plan on looking into this flaky issue this week. Thanks for your patience!

bitspittle commented 3 months ago

I am able to repro this issue locally. Coincidentally enough, it may be related to the same bug opened by another user around this same time, see also: https://github.com/varabyte/kotter/issues/116

Hopefully we can figure things out tomorrow. It sure seems like occasionally the keyboard input flow from the previous section doesn't shut down early enough and therefore the next section tries to start up and therefore briefly sees some of its keys getting swallowed.

"Good news" is that this is the sort of thing that in practice would only happen at the speeds of a computer running tests in a tight loop; it would not be expected for any user to notice this themselves (and even if they somehow did, it would be a single character missed at the beginning of what they were typing, which would be easy to retype)

bitspittle commented 3 months ago

Been a week since my last reply, but I got hit with a bunch of Kobweb stuff, sorry for yet another delay! Finally had some time to dig today.

So looking into this, this is really tricky. The issue is fundamentally dealing with subhuman speeds and coroutine timing. Basically when you call input we do two things:

1) create a Kotlin flow that read bytes from the terminal and convert them to Kotter Key instances. This flow is tied to the session lifecycle. 2) launch a Kotlin coroutine job that collects input Key events and updates a bunch of state that ends up in the final output that you see rendered on the screen, e.g. this is where cursor presses, the ENTER key, and etc. are handled. This work is tied to the section lifecycle.

With the flake you are seeing, what's happening is:

You start the first section, which spins up the "byte to Key" flow and shortly after launches the "process Key" job.
Eventually, the first section exits. The "byte to Key" flow is left alive while the "process Key" job is cancelled.
You run the second section, which launches a second "process Key" job. During this time, you immediately start sending keypresses.
The "byte to Key" flow handles these key presses, converting them to Key instances and sending them off. However...
The "process Key" job is not ready yet, so some of those initial Key events are dropped in that brief window.

I'm still hoping to fix this in a clean way, but so far my efforts have all failed.

I tried tying the "byte to Key" flow to the section lifecycle instead of the session lifecycle, but it's not easy -- it's tricky to cancel and then restart the flow. Meanwhile, tying the "process Key" job to the session lifecycle caused at least one of my other input tests to hang. Neat.

If you want a temporary fix, you can do something like this:

        onInputEntered {
            signal()
            userInput = input
        }
+       delay(16) // Give Kotter some time to set up `input` machinery
        callback()

varabyte / kotter

Larger tests become flaky #114