Open hmstepanek opened 2 years ago
This looks like a PyPI bug.
Skipping page https://pypi.org/simple/pytest/ because the GET request got Content-Type: Unknown
But that is not a valid Content-Type value. You should probably report this problem to https://github.com/pypa/warehouse.
For pip, maybe it’s acceptable to always fall back to the HTML parser if the Content-Type is invalid? PEP 503 does not specify what Content-Type the server should use…? @dstufft
I don't think this is a warehouse issue. We see this in cibuildwheel quite a bit too - I've been tracking it at https://github.com/pypa/cibuildwheel/issues/1254 , with examples of failures on 22.2.2 and 21.3.1.
I have a minimal recreation of the crash here - https://github.com/joerick/pip-concurrency-debug . Or see this actions run - ignore the cleanup errors after the crash.
My theory is that this is related to concurrent access to pip's cache (that was also an issue a few years back, in https://github.com/pypa/pip/issues/5345 - which gave me the idea). But I can only get the crash to reliably recreate when running different versions of pip at the same time. That is something that our integration tests on cibuildwheel do, because we run tests in parallel, and run different versions of Pip because we still support Python 3.6. I note that the newrelic crash above also is running tests in parallel via tox, and is running different versions of pip for the same reason- old versions of python.
In terms of specific versions, here are results from my tests-
pip versions running simultaneously | result |
---|---|
22.2.2, 21.3.1, 20.3.4 | 💥 |
20.3.4 | ✅ |
22.2.2 | ✅ |
21.3.1 | ✅ |
21.3.1, 20.3.4 | ✅ |
22.2.2, 21.3.1, 20.3.4 | 💥 |
22.2.2, 20.3.4 | 💥 |
22.2.2, 21.3.1 | 💥 |
22.2.2, 22.2.1 | ✅ |
22.2, 22.1.2 | 💥 |
So the issue seems to be using pip<22.2 and pip>=22.2 at the same time. Note that I have to spin up 10 concurrent threads constantly installing/uninstalling to hit this semi-reliably, so it's not easy to recreate!
Giving each pip version a separate PIP_CACHE_DIR appears to fix the issue, which seems to implicate the shared cache as the source of the problems. @hmstepanek I'd be very curious if you newrelic folks find that separate cache dirs alleviates the issue for you. The errors on cibuildwheel's CI are too sporadic to test any workaround there, and the runs take far too long.
@joerick Thank you for the insight! What you said makes total sense and would explain why not a lot of people seem to have this issue. I was beginning to wonder if it was just us. 🙈 I will give that workaround a try and see if it fixes the issue!
We were able to resolve this by disabling the cache when running pip on python 2.7. See Fix Pip Concurrency Issues for details. So we can confirm it does appear to be an issue with pip caches across pip versions.
Description
Our github action's CI has been failing sporadically with the following errors since 7/26:
Example 1:
https://github.com/newrelic/newrelic-python-agent/runs/7621668567?check_suite_focus=true
Example 2:
https://github.com/newrelic/newrelic-python-agent/runs/7621668309?check_suite_focus=true
We tried explicitly disabling the caching directory but this did not fix the issue. At around the same time this issue began, we noticed the following issue was filled in the pip repo: https://github.com/pypi/warehouse/issues/11949. We also noticed that the following issue appears to be the same issue we are running into now though the root cause may be different: https://github.com/pypa/pip/issues/5345.
Expected behavior
.tox/postgres-datastore_psycopg2-py39-psycopg20208/bin/pip install pytest==6.2.5
succeeds.pip version
22.2.1
Python version
I have personally seen failures on 3.9.13, 3.8.13, 2.7.18. These failures appear to be independent of the Python version.
OS
Ubuntu 20.04.4 LTS
How to Reproduce
**Note we have not been able to reproduce this locally on our machines, only inside our Github Actions CI.
Output