python / cpython

The Python programming language
https://www.python.org
Other
63.43k stars 30.37k forks source link

Multiple test failures on Alpine 3.15 / musl-1.2.2-r7 #90548

Open tiran opened 2 years ago

tiran commented 2 years ago
BPO 46390
Nosy @brettcannon, @terryjreedy, @tiran, @zware, @ncopa
Files
  • alpine315-tests.txt
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = None closed_at = None created_at = labels = ['type-bug', 'tests', 'build', '3.11'] title = 'Multiple test failures on Alpine 3.15 / musl-1.2.2-r7' updated_at = user = 'https://github.com/tiran' ``` bugs.python.org fields: ```python activity = actor = 'brett.cannon' assignee = 'none' closed = False closed_date = None closer = None components = ['Build', 'Tests'] creation = creator = 'christian.heimes' dependencies = [] files = ['50566'] hgrepos = [] issue_num = 46390 keywords = [] message_count = 7.0 messages = ['410645', '410848', '410849', '411187', '411191', '411193', '411905'] nosy_count = 5.0 nosy_names = ['brett.cannon', 'terry.reedy', 'christian.heimes', 'zach.ware', 'ncopa'] pr_nums = [] priority = 'normal' resolution = None stage = None status = 'open' superseder = None type = 'behavior' url = 'https://bugs.python.org/issue46390' versions = ['Python 3.11'] ```

    tiran commented 2 years ago

    I'm getting multiple test failures with latest Alpine 3.15 and musl-1.2.2-r7. Some test failures may be caused by wrong assumptions in our tests, some might be bugs in musl lib.c

    9 tests failed: test__locale test_c_locale_coercion test_cmd_line test_gdb test_locale test_os test_posix test_re test_selectors

    I have attached the output of

    ./python -m test -v test__locale test_c_locale_coercion test_cmd_line test_gdb test_locale test_os test_posix test_re test_selectors 2>&1 | tee alpine315-tests.txt

    You can use my container to reproduce the test failures:

    $ podman run --privileged -ti --rm -v $(pwd):/cpython:Z quay.io/tiran/cpythonbuild:alpine-3.15 /bin/sh
    # /cmd.sh
    # cd /cpython/builddep/alpine-3.15-x86_64/
    # make test
    kumaraditya303 commented 2 years ago

    These tests seems to be expected to fail on alpine.

    See https://github.com/alpinelinux/aports/blob/b36ed9bba2fdbf49a98dfdc3377c29271525082f/main/python3/APKBUILD#L123

    tiran commented 2 years ago

    I would put it differently: The package maintainer of Python on Alpine decided to ignore all test failures in these test modules.

    terryjreedy commented 2 years ago

    The first alpine315-tests.txt appears to be a truncated version of the second. Were you expecting the first to be automatically replaced? Should it be unlinked?

    https://www.alpinelinux.org "Alpine Linux is a security-oriented, lightweight Linux distribution based on musl libc and busybox." Fron the doc linked above:

    # Maintainer: Natanael Copa \ncopa@alpinelinux.org\ ## I nosied Natanael at his CLA-signed Alpine id. # Contributor: Sheila Aman \sheila@vulpine.house\ ... # musl related fail="test__locale test_locale test_strptime test_re" # various musl locale deficiencies fail="$fail test_c_locale_coercion" fail="$fail test_datetime" # hangs if 'tzdata' installed fail="$fail test_os" # fpathconf, ttyname errno values fail="$fail testposix" # sched[gs]etscheduler not impl fail="$fail test_shutil" # lchmod, requires real unzip

    Should we change CPython tests to accommodate things that are missing (versus buggy). Should the tests requiring sched_[gs]etscheduler be skipped if missing? Or are they required to be 'posix' and is test_posix meant to test completeness as well as correctness of what is present?

    tiran commented 2 years ago

    In my opinion we should treat these issues as Alpine / musl libc platform bugs and ask the Alpine maintainers to look into the issue. The tests are passing on Linux with glibc and BSD platforms (FreeBSD, macOS, ...) with BSD libc. It is reasonable to assume that failing test are caused by incompatibilities or deficiencies in musl libc, or by a different interpretation of POSIX and Open Group standards.

    I would not ignore or skip any test unless we have a thorough understanding of the problem and the deviation is documented. The issue can also affect user code.

    Python's test suite is exhaustive. Our tests have found a fair amount of problems in e.g, libm. A couple of years ago ine of my AF_ALG socket tests even found a Kernel bug by triggered a Kernel fault.

    tiran commented 2 years ago

    The comment about sched_[gs]etscheduler seems to be outdated. For one CPython's test suite has a @requires_sched decorator that performs a check for sched_getscheduler and the Kernel syscall. musl libc in Alpine 3.13 and 3.15 have sched_setscheduler.

    zware commented 2 years ago

    BTW, we do have an Alpine buildbot worker in the unstable set, running only on the main branch: https://buildbot.python.org/all/#/workers/19

    tiran commented 2 years ago

    GH-92826 is a compiler or musl libc issue on Alpine that manifests on PPC64LE platforms.

    brettcannon commented 2 years ago

    FYI neither Alpine nor musl are covered by PEP 11 (PPC64LE is tier-2, though).

    tianon commented 3 weeks ago

    FWIW, I visually spot-checked each instance of FAIL: in https://buildbot.python.org/all/api/v2/logs/11438284/raw_inline today (the latest build from https://buildbot.python.org/all/#/workers/19 at the time of my checking), and almost all the current failures are locale-related (which is somewhat expected with musl; see https://wiki.musl-libc.org/open-issues#Locale-limitations).

    If we ignore the Locale-related failures, here are the two failures I'm seeing that remain:

    ======================================================================
    FAIL: test_fma_zero_result (test.test_math.FMATests.test_fma_zero_result)
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      File "/buildbot/buildarea/3.x.ware-alpine/build/Lib/test/test_math.py", line 2760, in test_fma_zero_result
        self.assertIsNegativeZero(math.fma(tiny, -tiny, 0.0))
        ~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/buildbot/buildarea/3.x.ware-alpine/build/Lib/test/test_math.py", line 2876, in assertIsNegativeZero
        self.assertTrue(
        ~~~~~~~~~~~~~~~^
            value == 0 and math.copysign(1, value) < 0,
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
            msg="Expected a negative zero, got {!r}".format(value)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        )
        ^
    AssertionError: False is not true : Expected a negative zero, got 0.0
    
    ----------------------------------------------------------------------
    ======================================================================
    FAIL: test_fpathconf (test.test_os.TestInvalidFD.test_fpathconf)
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      File "/buildbot/buildarea/3.x.ware-alpine/build/Lib/test/test_os.py", line 2452, in test_fpathconf
        self.check(os.pathconf, "PC_NAME_MAX")
        ~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/buildbot/buildarea/3.x.ware-alpine/build/Lib/test/test_os.py", line 2379, in check
        self.fail("%r didn't raise an OSError with a bad file descriptor"
        ~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                  % f)
                  ^^^^
    AssertionError: <built-in function pathconf> didn't raise an OSError with a bad file descriptor
    
    ----------------------------------------------------------------------

    These certainly seem worrying, but not "the build is totally invalid and shouldn't exist" levels of worrying (which is what I was looking at this to evaluate -- namely whether https://github.com/docker-library/python can reasonably continue to provide Alpine-based builds or whether we should deprecate and remove them).

    To be very clear, I don't have any good opinions on what CPython maintainers should do here (and in fact would love to follow any recommendations, including deprecating these builds anyhow, if that's the consensus), but I went through some of the data and figured it'd be worth noting the data points I found. :bow: :heart: