Open JacobHenner opened 1 month ago
Are you able to test with pre releases, or attempt git bisect to help us narrow down the issue?
A
Are you able to test with pre releases, or attempt git bisect to help us narrow down the issue?
Yes - for the pre-releases, are there specific versions I should attempt?
I would attempt a manual bisection -- start with eg eg beta 1, then iterate through depending on if the failures are present.
I would attempt a manual bisection -- start with eg eg beta 1, then iterate through depending on if the failures are present.
I've reproduced the test_xml_etree
error on all rc releases, and also a1. I'm testing some beta releases now.
Oddly, the test_hashlib
error only occurs within our CI build environment, but not when building locally. The reason this is odd is because the same Dockerfile is used in both cases, with no known difference in build context that'd explain why it'd fail in one case and succeed in the other. I'll follow up on this one next.
The failing tests (test_xml_etree*
) were all touched in https://github.com/python/cpython/pull/115623/files#diff-cd2315ff25ab49e653d363de2d213e390ecb1b01bbc86e2be3b700db4b0bf454. Perhaps the changes in https://github.com/python/cpython/commit/9f74e86c78853c101a23e938f8e32ea838d8f62e, intended to address test failures for expat < 2.6, were incomplete?
Have you tried with a more recent version of expat (2.6+)? Do the tests succeed?
Have you tried with a more recent version of expat (2.6+)? Do the tests succeed?
Just did - test_xml_etree*
succeeds with expat 2.6.3 built from source on Rocky 8, with all else remaining equal.
Oddly, the test_hashlib error only occurs within our CI build environment, but not when building locally.
+1 to @JacobHenner – I have the exact same issue with test_hashlib. For 3.13.0 in CI, tests are failing both on rhel and debian. However, for all versions from 3.8 to 3.12, the tests pass without errors. Locally, the Dockerfile builds without errors.
Oddly, the test_hashlib error only occurs within our CI build environment, but not when building locally.
+1 to @JacobHenner – I have the exact same issue with test_hashlib. For 3.13.0 in CI, tests are failing both on rhel and debian. However, for all versions from 3.8 to 3.12, the tests pass without errors. Locally, the Dockerfile builds without errors.
What are you using to build your containers? Docker, podman, Kaniko?
I'm building with Kaniko.
All versions from 3.13.0a1
to 3.13.0rc3
are failing.
However, there is an important difference from your case: only test_hashlib
fails. test_xml_etree_c
works fine.
I'm building with Kaniko.
I suspect this is related. We're also building with Kaniko in the cases where it's failing. I'm analyzing this further now.
However, there is an important difference from your case: only test_hashlib fails. test_xml_etree_c works fine.
Which distro are you using, and with which version of expat?
We're seeing test_set fail in Docker building on Alpine 3.19 w/ expat 2.6.3
test_memoryview is also failing
I think my previous comments about test_hashlib
were misguided because of how the error lines are displayed. The actual error appears to be in test_generators
(FAIL: test_raise_and_yield_from (test.test_generators.SignalAndYieldFromTest.test_raise_and_yield_from)
). I wonder if this is Kaniko-specific, however Kaniko builds of 3.9-3.12 do not exhibit this issue.
For comparison, the test_xml_etree
and test_xml_etree_c
failures are not Kaniko-specific, and are instead associated with the version of expat installed.
We're seeing test_set fail in Docker building on Alpine 3.19 w/ expat 2.6.3
test_memoryview is also failing
Can you share the specific cases that are failing?
RHEL 8 (and so I assume Rocky Linux 8) has not updated expat, but it has patched it instead. The failing tests are only failing because they check for expat version:
https://github.com/python/cpython/blob/eafd14fbe0fd464b9d700f6d00137415193aa143/Lib/test/test_pyexpat.py#L801 https://github.com/python/cpython/blob/eafd14fbe0fd464b9d700f6d00137415193aa143/Lib/test/test_sax.py#L1253 https://github.com/python/cpython/blob/eafd14fbe0fd464b9d700f6d00137415193aa143/Lib/test/test_xml_etree.py#L1740
Which distro are you using, and with which version of expat?
I'm gradually trimming the Dockerfile down to the minimal example. I've stopped at the latest version of debian:bookworm, but the problem also reproduces on ubi8
and rockylinux
, so it seems that it's not dependent on the bistro.
The presence or absence of expat also does not affect the reproducibility of the error with test_hashlib
(and primarily errors in test generation, as you correctly pointed out)
Here's my Dockerfile in case anyone wants to verify the failing tests:
FROM alpine:3.19
ARG INSTALL_VERSION=3.13.0
ENV PATH="/usr/local/bin:${PATH}"
ENV LANG=C.UTF-8
# if this is called "PIP_VERSION", pip explodes with "ValueError: invalid truth value '<VERSION>'"
# https://pypi.org/project/pip/
ARG PYTHON_PIP_VERSION="24.2"
ARG SETUPTOOLS_VERSION="75.1.0"
ARG PYTHONDONTWRITEBYTECODE=1
SHELL ["/bin/ash", "-eo", "pipefail", "-c"]
# hadolint ignore=DL3003
RUN --mount=type=cache,target=/var/cache/apk \
--mount=type=cache,target=/tmp \
--mount=type=cache,target=/usr/src/python \
# Install fetch dependencies
apk add --no-cache --virtual .fetch-deps \
tar~=1.35 \
xz~=5.4.5 \
\
&& mkdir -p /usr/src/python \
# Fetch installation
&& wget -q -O /tmp/python.tar.xz "https://www.python.org/ftp/python/${INSTALL_VERSION%%[a-z]*}/Python-$INSTALL_VERSION.tar.xz" \
&& tar -xJC /usr/src/python --strip-components=1 -f /tmp/python.tar.xz \
\
# Delete fetch dependencies
&& apk del --no-network .fetch-deps && \
\
# Install build dependencies
apk add --no-cache --virtual .build-deps \
bluez-dev~=5.70 \
bzip2-dev~=1.0.8 \
coreutils~=9.4 \
dpkg-dev~=1.22.1 \
dpkg~=1.22 \
expat-dev~=2.6.3 \
findutils~=4.9.0 \
gcc~=13.2.1 \
gdbm-dev~=1.23 \
gnupg~=2.4.4 \
libc-dev~=0.7.2 \
libffi-dev~=3.4.4 \
libnsl-dev~=2.0.1 \
make~=4.4 \
ncurses-dev~=6.4 \
openssl-dev~=~=3.1.7 \
pax-utils~=1.3.7 \
readline-dev~=8.2 \
sqlite-dev~=3.44 \
tcl-dev~=8.6.13 \
tk-dev~=8.6.13 \
tk~=8.6.13 \
xz-dev~=5.4.5 && \
\
# Install dependencies
apk add --no-cache \
expat~=2.6.3 \
# CVE-2022-1304
libcom_err~=1.47 \
libuuid~=2.39 \
openssl~=3.1.7 \
ssl_client~=1.36 \
\
# Build Python
&& cd /usr/src/python \
&& gnuArch="$(dpkg-architecture --query DEB_BUILD_GNU_TYPE)" \
&& ./configure \
--build="$gnuArch" \
--enable-loadable-sqlite-extensions \
--enable-optimizations \
--with-lto \
--enable-option-checking=fatal \
--enable-shared \
--with-system-expat \
--without-ensurepip \
&& make -j "$(nproc)" \
# set thread stack size to 1MB so we don't segfault before we hit sys.getrecursionlimit()
# https://github.com/alpinelinux/aports/commit/2026e1259422d4e0cf92391ca2d3844356c649d0
EXTRA_CFLAGS="-DTHREAD_STACK_SIZE=0x100000" \
LDFLAGS="-Wl,--strip-all" \
&& make install \
\
# Install run dependencies
&& find /usr/local -type f -executable -not \( -name '*tkinter*' \) -exec scanelf --needed --nobanner --format '%n#p' '{}' ';' \
| tr ',' '\n' \
| sort -u \
| awk 'system("[ -e /usr/local/lib/" $1 " ]") == 0 { next } { print "so:" $1 }' \
| xargs -rt apk add --no-cache --virtual .python-rundeps \
\
# Delete build dependencies
&& apk del --no-network .build-deps \
\
# Clean up installation
&& find /usr/local -depth \
\( \
\( -type d -a \( -name test -o -name tests -o -name idle_test \) \) \
-o \
\( -type f -a \( -name '*.pyc' -o -name '*.pyo' \) \) \
\) -exec rm -rf '{}' + \
\
# Test for proper python installation
&& python3 --version | grep "${INSTALL_VERSION}" \
\
# Perform linking
&& ln -s /usr/local/bin/idle3 /usr/local/bin/idle \
&& ln -s /usr/local/bin/pydoc3 /usr/local/bin/pydoc \
&& ln -s /usr/local/bin/python3 /usr/local/bin/python \
&& ln -s /usr/local/bin/python3-config /usr/local/bin/python-config \
\
# Install PIP
&& wget -q -O /tmp/get-pip.py "https://bootstrap.pypa.io/get-pip.py" \
&& python /tmp/get-pip.py --disable-pip-version-check --no-cache-dir --no-compile \
"setuptools==${SETUPTOOLS_VERSION}" \
"pip==${PYTHON_PIP_VERSION}"
CMD ["python3"]
HEALTHCHECK NONE
It turns out that the test_xml_etree*
and test_generators
failures are not new, they're just no longer ignored in PROFILE_TASK when they fail (as of Python 3.13). I looked at a build of 3.12.7, and test_xml_etree
, test_xml_etree_c
, and test_generators
are all failing on RockyLinux 8, but this was not noticed earlier because the results were ignored during build.
I'm seeing test_xml_etree
+ test_xml_etree_c
failures during the PGO profile task on Ubuntu 22.04 with Python 3.13 too. (Plus like the above, I see those failures also occurred on older Python versions - the difference now is that they aren't being silently ignored during PGO.)
These tests pass for us on Ubuntu 20.04 and 24.04, just not 22.04. We build with --with-system-expat
, so presuming (based on earlier comments) that expat is the issue, that means:
Example failing build log: https://github.com/heroku/heroku-buildpack-python/actions/runs/11259777638/job/31310276989#step:4:1374
The Dockerfile/scripts used to build: https://github.com/heroku/heroku-buildpack-python/blob/ebd222f304ed315e84b7c8c6b8a89898c05e88ca/builds/Dockerfile https://github.com/heroku/heroku-buildpack-python/blob/ebd222f304ed315e84b7c8c6b8a89898c05e88ca/builds/build_python_runtime.sh
xref: https://github.com/heroku/heroku-buildpack-python/pull/1661#issuecomment-2402921271
I now believe the title of this issue is misleading. I've just confirmed that test_xml_etree
, test_xml_etree_c
, and test_generators
all fail in the latest versions of Python 3.9-3.13. The reason this was noticed in 3.13 is because PROFILE_TASK failures are no longer ignored during build as of 3.13.
The specific test cases failures should probably be tracked separately. It's possible that users will see new build failures in other environments where the tests had always been failing, but not breaking the build.
test_xml_etree*
issue is understood for Rocky Linux: https://github.com/python/cpython/issues/125067#issuecomment-2400717818test_generators
- I've not had time to dive deep on this one yet, but I've only encountered it when building with Kaniko. I'd guess that Kaniko breaks SIGINT handling in some way that's relevant to the test, but I have no evidence to support this theory yet.@vkurilin:
For 3.13.0 in CI, tests are failing both on rhel and debian.
Strange, we have Debian and RHEL buildbots and the whole test suite pass there.
RHEL 8 (and so I assume Rocky Linux 8) has not updated expat, but it has patched it instead. The failing tests are only failing because they check for expat version:
The version checks not matching Ubuntu's backported versions appear to be the cause in one case (the failure in test_flush_reparse_deferral_disabled
), however, two other tests are failing (test_simple_xml_chunk_1
and test_simple_xml_chunk_5
) which don't have any expat version checks?
For failure output and links to the full logs, see: https://github.com/heroku/heroku-buildpack-python/pull/1661#issuecomment-2405259352
Also, cross-linking some prior history in this area:
In the meantime we've had to resort to disabling the three affecting tests on Ubuntu 22.04 with Python 3.13.0: https://github.com/heroku/heroku-buildpack-python/pull/1661/commits/47c9f3a7566efe5946e286837ab0fc6952ba6478
test_simple_xml_chunk_1 and test_simple_xml_chunk_5 used to have expat version check, I am not entirely sure why it was removed in https://github.com/python/cpython/pull/115623/commits/e5e403306c65199f53a02e33a2ac8e28eb45d7a1 (https://github.com/python/cpython/pull/115623)
What are the next steps here? It would be great to have a fix for this for Python 3.13.1 so we don't need custom patches to build from source with PGO enabled :-)
Can someone try if reverting https://github.com/python/cpython/commit/e5e403306c65199f53a02e33a2ac8e28eb45d7a1 fix the issue?
Strange, we have Debian and RHEL buildbots and the whole test suite pass there.
Ah, RHEL8 buildbots don't use --with-system-expat
but the embedded copy of libexpat: expat 2.6.3. In this case, test_xml_etree pass.
When using --with-system-expat
on RHEL 8.10 (expat version 2.2.5), I get 3 test failures in test_xml_etree:
Can someone try if reverting https://github.com/python/cpython/commit/e5e403306c65199f53a02e33a2ac8e28eb45d7a1 fix the issue?
Reverting the change doesn't change anything for expat 2.2.5.
Before this change, test_simple_xml_chunk_1() and test_simple_xml_chunk_5() were skipped on expat 2.6 and newer.
Please open a separated issue for test_generators and test_hashlib.
@serhiy-storchaka: Would you mind to have a look at this expat issue? Tests fail on old expat 2.2.5.
There is no way to know whether reparse deferral was enabled in Expat. pyexpat
maintains its own flag which is set by default to True if and only if Expat version is >= 2.6.0, and then update it after calling XML_SetReparseDeferralEnabled()
. If reparse deferral was backported to Expat < 2.6.0, pyexpat
still think that it is disabled. xmlparser.SetReparseDeferralEnabled
is no-op because XML_SetReparseDeferralEnabled
does not exist in Expat < 2.6.0. Therefore, we cannot disable it and test for disabled reparse deferral.
I do not see how we can fix this on our side without skipping some tests on vanilla Expat < 2.6.0. I think that if you use a patched Expat, you should also patch Python (at least tests) and all user code which depends on the old Expat behavior.
@hartwork, I afraid this issue will haunt us for years.
expat 2.2.5 was released at November 1, 2017. expat 2.6.0 was released at February 6, 2024.
What do you think of skipping the 3 failing tests on expat < 2.6.0?
As far as I understand, this is not a vanilla Expat 2.2.5, but Expat 2.2.5 patched with some changes from 2.6. We do not want to skip tests on platforms that use Expat < 2.6 which does not include such changes (it may be a majority of computer for now).
I suggest to only skip them on platforms where the patched old Expat is used. We cannot know what changes were backported in the particular distribution, so we should leave this for maintainers of these distributions.
We cannot know what changes were backported in the particular distribution, so we should leave this for maintainers of these distributions.
Full ack, with both my upstream and my downstream hat on.
I suggest to close this issue as "WONT FIX".
Bug report
Bug description:
I'm attempting to build Python 3.13.0 for Rocky Linux 8, and several tests are failing:
These tests are required to succeed in order for
--enable-optimizations
to work as expected.I am using the following build options:
3.9-3.12.6 do not exhibit this behavior when built with the same commands (aside from the Python version).
I'm curious whether there might be a regression for certain versions of expat, similar to https://github.com/python/cpython/issues/117187.
expat-devel version is
expat-2.2.5-15.el8
.The specific test failures are:
CPython versions tested on:
3.13
Operating systems tested on:
Linux