teemtee / tmt

Test Management Tool
MIT License
80 stars 122 forks source link

bug: tmt try multihost plan breaks terminal #3138

Open spetrosi opened 1 month ago

spetrosi commented 1 month ago

After running a plan with multi-host provisioning with 3+ hosts, terminal "breaks", it prints many spaces in the beginning of the lines, and then doesn't work. With two nodes, the output is fine. With >3 nodes, it starts shifting.

Plan from plans/test_plan.fmf:

summary: A general test for a system role
tag: general_test
provision:
  - name: control_node
    role: control_node
    connection: system
  - name: managed_node1
    role: managed_node
    connection: system
  - name: managed_node2
    role: managed_node
    connection: system
  - name: managed_node3
    role: managed_node
    connection: system
environment:
    ANSIBLE_VER: 2.17
    REPO_NAME: "mssql"
    PYTHON_VERSION: 3.12
prepare:
  # Replace with feature: epel: enabled once https://github.com/teemtee/tmt/pull/3128 is merged
  - name: Enable epel to install beakerlib
    script: |
      # CS 10 and Fedora doesn't require epel
      if grep -q -e 'CentOS Stream release 10' -e 'Fedora release' /etc/redhat-release; then
        exit 0
      fi
      yum install epel-release yum-utils -y
      yum-config-manager --enable epel epel-debuginfo epel-source
discover:
  - name: Prepare managed nodes
    how: fmf
    url: https://github.com/linux-system-roles/tft-tests
    ref: main
    where: managed_node
    filter: tag:prep_managed_node
execute:
    how: tmt

I am running it with

tmt try -p plans/test_plan centos-stream-9

What my terminal outputs: https://pastebin.com/Ae18humU

Terminal breaks after Running 'rlImport library' line

spetrosi commented 1 month ago
$ tmt --version
tmt version: 1.34.0
spetrosi commented 4 weeks ago

I found a workaround - when test plan begins, I type alt+Enter, ctrl+Enter, Enter several times and then the plan runs smoothly, bringing me to the interactive menu as expected.

spetrosi commented 3 weeks ago

I am providing a simpler reproducer, a plan that provisions 3 nodes, and a simple test. It enough to create two files like below and run tmt try -p plans/reproducer.

plans/reproducer.fmf:

summary: Reproduce bug with extra indentation
provision:
  - name: reproducer1
    role: reproducer
  - name: reproducer2
    role: reproducer
  - name: reproducer3
    role: reproducer
discover:
  - name: Run on 3 nodes
    how: fmf
    where: reproducer
    filter: tag:reproducer
execute:
    how: tmt

tests/reproducer/main.fmf:

summary: Print date for test purposes
framework: shell
duration: 1m
tag: reproducer
test: date

I noticed what exactly causes this - using where keyword to specify a role that has more than 2 nodes.

This issue blocks work on multi-node scenarios that use more than 2 nodes.

spetrosi commented 3 weeks ago

reproducer.log

happz commented 3 weeks ago

@spetrosi can you try with the following patch applied?

diff --git a/tmt/log.py b/tmt/log.py
index 66db71c5..18d848b9 100644
--- a/tmt/log.py
+++ b/tmt/log.py
@@ -287,7 +287,7 @@ class LogfileHandler(logging.FileHandler):
 # with older Python versions. Since it's not critical to mark the handler as "str only", we can
 # ignore the issue for now.
 class ConsoleHandler(logging.StreamHandler):  # type: ignore[type-arg]
-    pass
+    terminator = '\r\n'

 class _Formatter(logging.Formatter):
spetrosi commented 3 weeks ago

@happz it definitelly helps, but not everywhere. With the reproducer, it helps in the test phase, but the interactive menu is still broken. And after run finishes, I am not able to type in this terminal.

                       test          rediscover tests and execute them again
                                                                                login         log into the guest for experimenting
                                                                                                                                      verbose       set the desired level of verbosity
                                                                                                                                                                                          debug         choose a different debugging level

                                                                                                                                                                                                                                              discover      gather information about tests to be executed
                                                               prepare       prepare the environment for testing
                                                                                                                    execute       run tests using the specified executor
                                                                                                                                                                            report        provide test results overview and send reports
                                                                                                                                                                                                                                            finish        perform the finishing tasks, clean up guests

                                                            keep          exit the session but keep the run for later use
                                                                                                                             quit          clean up the run and quit the session

                                                                                                                                                                                > 
happz commented 3 weeks ago

Is there anything special, or uncommon about your terminal settings? Because I'm unable to reproduce the problem, and since it looks like at some point, \n stops being enough for your terminal & \r would be needed, it's not like tmt suddenly stops emitting different line ends. So it's really puzzling and hard to debug, and I'm starting to think whether the problem is in tmt at all.

happz commented 3 weeks ago

Never mind, I just reproduced the problem. FTR, it works perfectly fine with container, but with virtual, I observe the reported behavior.

spetrosi commented 3 weeks ago

Right, I am using virtual. Recently I tried using different terminals or a terminal in vscodium with same results. I too had a feeling that it may an issue on my side.

happz commented 3 weeks ago

So, running stty sane after the test command quits makes things right again. I suspected there must be something breaking the terminal, messing with its settings, like echo and so on, which would make sense, tmt try is running those commands in interactive mode, test commands share stdout & stderr with their parent, tmt, so it's possible they might mess it up.

A couple of notes to think about:

I would even guess this might be simply the cause here, 3 SSH processes fighting over 1 terminal. Sooner or later, one of them does A and the other one undoes it, and we end up with an inconsistent setup. We can add stty sane, but I'd say we need to think about whether it's possible to consider it a fix, or whether it's just one part of the solution: stty sane + recommendation on what happens with multihost and ANSI control codes + option to disable interactive + ... And even the simplest beakerlib test uses colors and runs over SSH...

With the interactive mode disabled in tmt try, it just works, no stty sane needed - they no longer fight over shared terminal... And even basic Beakerlib tests print out colors...

happz commented 3 weeks ago

3 trivial Beakerlib tests + stty sane - yes, eventually, when tests are done, you get tmt try prompt back, and you can control tmt and input values. But the output is useless.

    execute
        queued execute task #1: Run on 3 nodes on reproducer1 (reproducer), reproducer2 (reproducer) and reproducer3 (reproducer)

        execute task #1: Run on 3 nodes on reproducer1 (reproducer), reproducer2 (reproducer) and reproducer3 (reproducer)
[reproducer3 (reproducer)]         started
[reproducer3 (reproducer)]         how: tmt
[reproducer3 (reproducer)]         order: 50
[reproducer2 (reproducer)]         started
[reproducer3 (reproducer)]         exit-first: false
[reproducer2 (reproducer)]         how: tmt
[reproducer1 (reproducer)]         started
[reproducer2 (reproducer)]         order: 50
[reproducer1 (reproducer)]         how: tmt
[reproducer2 (reproducer)]         exit-first: false
[reproducer1 (reproducer)]         order: 50
[reproducer1 (reproducer)]         exit-first: false
[reproducer2 (reproducer)]             test: Print date for test purposes
[reproducer1 (reproducer)]             test: Print date for test purposes
[reproducer3 (reproducer)]             test: Print date for test purposes
[reproducer2 (reproducer)]         warn: Ignoring requested duration, not supported in interactive mode.
[reproducer2 (reproducer)]                 cmd: ./test.sh
[reproducer1 (reproducer)]         warn: Ignoring requested duration, not supported in interactive mode.
[reproducer1 (reproducer)]                 cmd: ./test.sh
[reproducer3 (reproducer)]         warn: Ignoring requested duration, not supported in interactive mode.
[reproducer3 (reproducer)]                 cmd: ./test.sh

::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
::   Setup
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
::   Duration: 0s
::   Assertions: 0 good, 0 bad
::   RESULT: PASS (Setup)

::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
                                                                                ::   Setup
                                                                                          ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

                                                                                                                                                                          ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
                                       ::   Setup
                                                 ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

                                                                                                                                 ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
                                                                                                                                                                                                                 ::   Duration: 0s
               ::   Assertions: 0 good, 0 bad
                                             ::   RESULT: PASS (Setup)

                                                                      ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
                                                                                                                                                      ::   Duration: 0s
                                                                                                                                                                       ::   Assertions: 0 good, 0 bad
                                                                                                                                                                                                     ::   RESULT: PASS (Setup)

::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
::   TEST PROTOCOL
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

    Package       : unknown
    beakerlib RPM : beakerlib-1.29.3-5.fc40.noarch
    Test name     : 
    Test started  : 2024-08-16 20:14:12 UTC
    Test finished : 2024-08-16 20:14:13 UTC (still running)
    Test duration : 1 seconds
    Distro        : Fedora release 40 (Forty)
    Hostname      : reproducer2
    Architecture  : unknown
    CPUs          : 2 x Intel(R) Core(TM) i7-10610U CPU @ 1.80GHz
    RAM size      : 1914 MB
    HDD size      : 117.80 GB

::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
::   Setup
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
::   Duration: 0s
::   Assertions: 0 good, 0 bad
::   RESULT: PASS (Setup)

Shared connection to 127.0.0.1 closed.

::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
::   TEST PROTOCOL
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

    Package       : unknown
    beakerlib RPM : beakerlib-1.29.3-5.fc40.noarch
    Test name     : 
    Test started  : 2024-08-16 20:14:13 UTC
    Test finished : 2024-08-16 20:14:13 UTC (still running)
    Test duration : 0 seconds
    Distro        : Fedora release 40 (Forty)
    Hostname      : reproducer1
    Architecture  : unknown
    CPUs          : 2 x Intel(R) Core(TM) i7-10610U CPU @ 1.80GHz
    RAM size      : 1914 MB
    HDD size      : 117.80 GB

::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
::   Setup
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
::   Duration: 0s
::   Assertions: 0 good, 0 bad
::   RESULT: PASS (Setup)

::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
::   TEST PROTOCOL
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

    Package       : unknown
    beakerlib RPM : beakerlib-1.29.3-5.fc40.noarch
    Test name     : 
    Test started  : 2024-08-16 20:14:13 UTC
    Test finished : 2024-08-16 20:14:13 UTC (still running)
    Test duration : 0 seconds
    Distro        : Fedora release 40 (Forty)
    Hostname      : reproducer3
    Architecture  : unknown
    CPUs          : 2 x Intel(R) Core(TM) i7-10610U CPU @ 1.80GHz
    RAM size      : 1914 MB
    HDD size      : 117.80 GB

::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
::   Setup
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
::   Duration: 0s
::   Assertions: 0 good, 0 bad
::   RESULT: PASS (Setup)

Shared connection to 127.0.0.1 closed.
Shared connection to 127.0.0.1 closed.
[reproducer2 (reproducer)]                 00:00:00 errr /tests/reproducer (on reproducer2 (reproducer)) (beakerlib: State 'incomplete') [1/1]

[reproducer1 (reproducer)]                 00:00:00 errr /tests/reproducer (on reproducer1 (reproducer)) (beakerlib: State 'incomplete') [1/1]

[reproducer3 (reproducer)]                 00:00:00 errr /tests/reproducer (on reproducer3 (reproducer)) (beakerlib: State 'incomplete') [1/1]

[reproducer2 (reproducer)]         finished
[reproducer1 (reproducer)]         finished
[reproducer3 (reproducer)]         finished

        summary: 3 tests executed
spetrosi commented 3 weeks ago

Many thanks for finding the cause of this. stty sane - very smart, I didn't know that.

We are running tests in testing farm via https://github.com/marketplace/actions/schedule-tests-on-testing-farm, e.g. in https://github.com/linux-system-roles/ha_cluster/actions/runs/10450150449/job/28933928537. The log in the testing farm is printed correctly. Maybe in a multihost scenario tmt try might avoid interactively printing test outputs on the run, but instead save it to a log and then print the whole log, which is formatted correctly, once the test is ready. And then exit to the interactive menu. Just thinking out loud.

happz commented 3 weeks ago

Many thanks for finding the cause of this. stty sane - very smart, I didn't know that.

We are running tests in testing farm via https://github.com/marketplace/actions/schedule-tests-on-testing-farm, e.g. in https://github.com/linux-system-roles/ha_cluster/actions/runs/10450150449/job/28933928537. The log in the testing farm is printed correctly.

That is expected, because TF does not run them in interactive mode, and they don't share stdout/stderr channels.

Maybe in a multihost scenario tmt try might avoid interactively printing test outputs on the run, but instead save it to a log and then print the whole log, which is formatted correctly, once the test is ready. And then exit to the interactive menu. Just thinking out loud.

Yes, something along those lines. I would dare to suggest disabling interactive mode in multihost scenarios by default - on top of output formatting, how about shared stdin? All three branches of testing share it, if the user wants to interact with the test, which one of them? The one on guest 1, or the one on guest 2? Maybe the one on guest 3 which is still running? This quickly turns into getting ugly output & not being able to use input reliably either.

Maybe we need some kind of window, like ncurses windows: split the screen, have a window for the menu and a window for each guest, separate their output, and let users switch between windows when talking to their tests. In any case, shared stdout and/or stdin will cause trouble.

spetrosi commented 3 weeks ago

Is it possible to use tmt run plans to do the following?

  1. Run the plan and optionally keep the guests to be able to rerun a given test against running guests
  2. Print the result of tests to a file
  3. Not use interactive mode. I did run tmt run plans --filter 'tag:mytag', where logs are stored in this case?