gramineproject / gramine

A library OS for Linux multi-process applications, with Intel SGX support
GNU Lesser General Public License v3.0
603 stars 201 forks source link

GDB_VERSION is not being resolved correctly as expected on RockyLinux and AlmaLinux systems, causing the libos tests to fail #2038

Closed anjalirai-intel closed 1 month ago

anjalirai-intel commented 1 month ago

Description of the problem

GDB_VERSION is not being resolved correctly as expected in the regression.py file on RockyLinux and AlmaLinux systems, causing the libos tests to fail

Steps to reproduce

Code snippet to get GDB_VERSION have been taken from graminelibos/regression.py check_gdb.py:

import subprocess

try:
    GDB_VERSION = tuple(int(i) if i.isdigit() else i for i in subprocess.check_output(
        ['gdb', '-q', '-ex', 'python print(gdb.VERSION)', '-ex', 'q']
    ).strip().decode('ascii').split('.'))
except (subprocess.SubprocessError, OSError):
    GDB_VERSION = None

print("GDB_VERSION: ", GDB_VERSION, type(GDB_VERSION))
print(GDB_VERSION < (13,))

Ubuntu 24.04:

$ python3 check_gdb.py
GDB_VERSION:  (15, 0, 50, '20240403-git') <class 'tuple'>
False

RockyLinux:

$ python3 check_gdb.py
GDB_VERSION:  ('Rocky Linux 10', '2-13', 'el9') <class 'tuple'>
Traceback (most recent call last):
  File "/intel/check_gdb.py", line 11, in <module>
    print(GDB_VERSION < (13,))
TypeError: '<' not supported between instances of 'str' and 'int'

The error occurs because the GDB_VERSION tuple contains strings that cannot be compared with integers, leading to a TypeError.

AlmaLinux:

[root@574c389eaaf5 /]# python3 check_gdb.py
GDB_VERSION:  ('Red Hat Enterprise Linux 10', '2-13', 'el9') <class 'tuple'>
Traceback (most recent call last):
  File "//check_gdb.py", line 11, in <module>
    print(GDB_VERSION < (13,))
TypeError: '<' not supported between instances of 'str' and 'int'

[root@574c389eaaf5 /]# cat /etc/os-release
NAME="AlmaLinux"
VERSION="9.4 (Seafoam Ocelot)"
ID="almalinux"
ID_LIKE="rhel centos fedora"
VERSION_ID="9.4"
PLATFORM_ID="platform:el9"
PRETTY_NAME="AlmaLinux 9.4 (Seafoam Ocelot)"
ANSI_COLOR="0;34"
LOGO="fedora-logo-icon"
CPE_NAME="cpe:/o:almalinux:almalinux:9::baseos"
HOME_URL="https://almalinux.org/"
DOCUMENTATION_URL="https://wiki.almalinux.org/"
BUG_REPORT_URL="https://bugs.almalinux.org/"

ALMALINUX_MANTISBT_PROJECT="AlmaLinux-9"
ALMALINUX_MANTISBT_PROJECT_VERSION="9.4"
REDHAT_SUPPORT_PRODUCT="AlmaLinux"
REDHAT_SUPPORT_PRODUCT_VERSION="9.4"
SUPPORT_END=2032-06-01

Expected results

LibOS tests should be running successfully

Actual results

python3 -m pytest -v -k 'not attestation' --junit-xml libos-regression.xml
============================= test session starts ==============================
platform linux -- Python 3.9.18, pytest-6.2.2, py-1.10.0, pluggy-0.13.1 -- /usr/bin/python3
cachedir: .pytest_cache
rootdir: /home/intel/jenkins/workspace/local_ci_graphene_sgx_almalinux_server_6.2/gramine, configfile: pytest.ini
collecting ... collected 1 item / 1 error

==================================== ERRORS ====================================
_____________ ERROR collecting libos/test/regression/test_libos.py _____________
test_libos.py:1397: in <module>
    class TC_50_GDB(RegressionTestCase):
test_libos.py:1470: in TC_50_GDB
    ???
E   TypeError: '<' not supported between instances of 'str' and 'int'
- generated xml file: /home/intel/jenkins/workspace/local_ci_graphene_sgx_almalinux_server_6.2/gramine/libos/test/regression/libos-regression.xml -
=========================== short test summary info ============================
ERROR test_libos.py - TypeError: '<' not supported between instances of 'str'...
!!!!!!!!!!!!!!!!!!!! Interrupted: 1 error during collection !!!!!!!!!!!!!!!!!!!!
=============================== 1 error in 0.57s ===============================

Gramine commit hash

971f1a77556b5215ffffe8eb11dd6e68fba18a87

dimakuv commented 1 month ago

Quick summary of the above issue: some OS distros print the first part of GDB's version not as a simple (major version) number, but as a long-ish string: Rocky Linux 10, Red Hat Enterprise Linux 10. This breaks the comparison code GDB_VERSION < (13,), as we can't compare this string to an integer.

I think this shouldn't be a blocker for the release for two reasons:

  1. This happens only on RockyLinux and AlmaLinux (and well, I guess also on RHEL).
  2. This is only about LibOS regression tests, which is not included in our packages.

@mkow @woju @kailun-qin What do you think? Can we release v1.8 without a fix for this issue?


I think an ad-hoc fix will be like this:

$ git diff
diff --git a/libos/test/regression/test_libos.py b/libos/test/regression/test_libos.py
index 54ed0905..0c9fdd66 100644
--- a/libos/test/regression/test_libos.py
+++ b/libos/test/regression/test_libos.py
@@ -1467,7 +1467,7 @@ class TC_50_GDB(RegressionTestCase):
     # uses) non-main threads in the parent process get stuck in "tracing stop"
     # state after vfork+execve. This test uses gdb and unfortunately triggers
     # the bug.
-    @unittest.skipUnless(GDB_VERSION is not None and GDB_VERSION < (13,),
+    @unittest.skipUnless(GDB_VERSION is not None and tuple(int(s) for s in GDB_VERSION[0].split() if s.isdigit()) < (13,),
         f'missing or known buggy GDB ({GDB_VERSION=})')
     def test_020_gdb_fork_and_access_file_bug(self):
         # To run this test manually, use:
diff --git a/python/graminelibos/regression.py b/python/graminelibos/regression.py
index 65b3c832..e8831772 100644
--- a/python/graminelibos/regression.py
+++ b/python/graminelibos/regression.py
@@ -23,7 +23,7 @@ IS_VM = os.environ.get('IS_VM') == '1'
 ON_X86 = os.uname().machine in ['x86_64']
 USES_MUSL = os.environ.get('GRAMINE_MUSL') == '1'
 try:
-    GDB_VERSION = tuple(int(i) if i.isdigit() else i for i in subprocess.check_output(
+    GDB_VERSION = tuple(i for i in subprocess.check_output(
         ['gdb', '-q', '-ex', 'python print(gdb.VERSION)', '-ex', 'q']
     ).strip().decode('ascii').split('.'))
 except (subprocess.SubprocessError, OSError):

(Sorry for terrible Python, I just wanted to make this work.)

I.e., we don't compare the whole GDB_VERSION[0] string, but we extract the first integer from this string. Should work for all the cases described above.

woju commented 1 month ago

@dimakuv I think we can split just the last section separated by whitespace. See #2041 for attempted fix.