pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.77k stars 17.97k forks source link

DEV pip install -r requirements-dev.txt takes...54 minutes? #50239

Closed MarcoGorelli closed 1 year ago

MarcoGorelli commented 1 year ago

Example from a CI run https://github.com/pandas-dev/pandas/actions/runs/3687244987/jobs/6240575739

Logs show:

2022-12-13T17:19:32.1280438Z Collecting boto3
2022-12-13T17:19:32.1360450Z   Downloading boto3-1.26.22-py3-none-any.whl (132 kB)
2022-12-13T17:19:32.1448476Z      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 132.6/132.6 kB 23.1 MB/s eta 0:00:00
2022-12-13T17:19:37.4134586Z Collecting botocore
2022-12-13T17:19:37.4207221Z   Downloading botocore-1.29.22-py3-none-any.whl (10.2 MB)

[>30 minutes later]

2022-12-13T18:10:45.7020268Z Collecting boto3
2022-12-13T18:10:45.7113509Z   Downloading boto3-1.17.107-py2.py3-none-any.whl (131 kB)
2022-12-13T18:10:45.7197608Z      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 131.6/131.6 kB 25.5 MB/s eta 0:00:00
2022-12-13T18:10:51.7547649Z Collecting botocore
2022-12-13T18:10:51.7657507Z   Downloading botocore-1.20.107-py2.py3-none-any.whl (7.7 MB)

[then finally]

2022-12-13T18:12:26.4475803Z Successfully installed MarkupSafe-2.1.1 Send2Trash-1.8.0 aiobotocore-1.4.2 aiohttp-3.8.3 aioitertools-0.11.0 aiosignal-1.3.1 alabaster-0.7.12 anyio-3.6.2 argon2-cffi-21.3.0 argon2-cffi-bindings-21.2.0 arrow-1.2.3 asttokens-2.2.1 asv-0.5.1 async-timeout-4.0.2 attrs-22.1.0 babel-2.11.0 backcall-0.2.0 beautifulsoup4-4.11.1 black-22.10.0 bleach-5.0.1 blosc-1.11.0 boto3-1.17.106 botocore-1.20.106 bottleneck-1.3.5 brotlipy-0.7.0 cachetools-5.2.0 certifi-2022.12.7 cffi-1.15.1 cfgv-3.3.1 charset-normalizer-2.1.1 click-8.1.3 cloudpickle-2.2.0 comm-0.1.2 contourpy-1.0.6 coverage-6.5.0 cpplint-1.6.1 cramjam-2.6.2 cryptography-38.0.4 cycler-0.11.0 cython-0.29.32 dask-2022.12.0 debugpy-1.6.4 decorator-5.1.1 defusedxml-0.7.1 distlib-0.3.6 docutils-0.17.1 entrypoints-0.4 et-xmlfile-1.1.0 exceptiongroup-1.0.4 execnet-1.9.0 executing-1.2.0 fastjsonschema-2.16.2 fastparquet-2022.12.0 feedparser-6.0.10 filelock-3.8.2 flake8-6.0.0 flake8-bugbear-22.7.1 flask-2.2.2 fonttools-4.38.0 fqdn-1.5.1 frozenlist-1.3.3 fsspec-2021.11.0 gcsfs-2021.11.0 gitdb-4.0.10 gitpython-3.1.29 google-api-core-2.11.0 google-auth-2.15.0 google-auth-oauthlib-0.8.0 google-cloud-core-2.3.2 google-cloud-storage-2.7.0 google-crc32c-1.5.0 google-resumable-media-2.4.0 googleapis-common-protos-1.57.0 greenlet-2.0.1 html5lib-1.1 hypothesis-6.61.0 identify-2.5.9 idna-3.4 imagesize-1.4.1 importlib-metadata-5.1.0 importlib-resources-5.10.1 iniconfig-1.1.1 ipykernel-6.19.2 ipython-8.7.0 ipython-genutils-0.2.0 ipywidgets-8.0.3 isoduration-20.11.0 isort-5.11.1 itsdangerous-2.1.2 jedi-0.18.2 jinja2-3.1.2 jmespath-0.10.0 jsonpointer-2.3 jsonschema-4.17.3 jupyter-client-7.4.8 jupyter-core-5.1.0 jupyter-events-0.5.0 jupyter-server-2.0.1 jupyter-server-terminals-0.4.2 jupyterlab-pygments-0.2.2 jupyterlab-widgets-3.0.4 kiwisolver-1.4.4 llvmlite-0.39.1 locket-1.0.0 lxml-4.9.1 markdown-3.4.1 matplotlib-3.6.2 matplotlib-inline-0.1.6 mccabe-0.7.0 mistune-2.0.4 moto-4.0.11 multidict-6.0.3 mypy-0.990 mypy-extensions-0.4.3 natsort-8.2.0 nbclassic-0.4.8 nbclient-0.7.2 nbconvert-7.2.6 nbformat-5.7.0 nbsphinx-0.8.10 nest-asyncio-1.5.6 nodeenv-1.7.0 notebook-6.5.2 notebook-shim-0.2.2 numba-0.56.4 numexpr-2.8.4 numpy-1.23.5 numpydoc-1.5.0 oauthlib-3.2.2 odfpy-1.4.1 openpyxl-3.0.10 packaging-22.0 pandas-1.5.2 pandas-dev-flaker-0.5.0 pandoc-2.3 pandocfilters-1.5.0 parso-0.8.3 partd-1.3.0 pathspec-0.10.3 pexpect-4.8.0 pickleshare-0.7.5 pillow-9.3.0 pkgutil-resolve-name-1.3.10 platformdirs-2.6.0 pluggy-1.0.0 plumbum-1.8.0 ply-3.11 pre-commit-2.20.0 prometheus-client-0.15.0 prompt-toolkit-3.0.36 protobuf-4.21.11 psutil-5.9.4 psycopg2-binary-2.9.5 ptyprocess-0.7.0 pure-eval-0.2.2 py-1.11.0 pyarrow-9.0.0 pyasn1-0.4.8 pyasn1-modules-0.2.8 pycodestyle-2.10.0 pycparser-2.21 pydata-sphinx-theme-0.10.1 pyflakes-3.0.1 pygments-2.13.0 pymysql-1.0.2 pyparsing-3.0.9 pyreadstat-1.2.0 pyrsistent-0.19.2 pytest-7.2.0 pytest-asyncio-0.20.3 pytest-cov-4.0.0 pytest-cython-0.2.0 pytest-xdist-3.1.0 python-dateutil-2.8.2 python-json-logger-2.0.4 python-snappy-0.6.1 pytz-2022.6 pyupgrade-3.3.1 pyxlsb-1.0.10 pyyaml-6.0 pyzmq-24.0.1 requests-2.28.1 requests-oauthlib-1.3.1 responses-0.22.0 rfc3339-validator-0.1.4 rfc3986-validator-0.1.1 rsa-4.9 s3fs-2021.11.0 s3transfer-0.4.2 scipy-1.9.3 seaborn-0.12.1 setuptools-65.6.3 sgmllib3k-1.0.0 six-1.16.0 smmap-5.0.0 sniffio-1.3.0 snowballstemmer-2.2.0 sortedcontainers-2.4.0 soupsieve-2.3.2.post1 sphinx-4.5.0 sphinx-copybutton-0.5.1 sphinx-panels-0.6.0 sphinx-toggleprompt-0.3.1 sphinxcontrib-applehelp-1.0.2 sphinxcontrib-devhelp-1.0.2 sphinxcontrib-htmlhelp-2.0.0 sphinxcontrib-jsmath-1.0.1 sphinxcontrib-qthelp-1.0.3 sphinxcontrib-serializinghtml-1.1.5 sqlalchemy-1.4.45 stack-data-0.6.2 tables-3.7.0 tabulate-0.9.0 terminado-0.17.1 tinycss2-1.2.1 tokenize-rt-5.0.0 toml-0.10.2 tomli-2.0.1 toolz-0.12.0 tornado-6.2 traitlets-5.7.1 types-PyMySQL-1.0.19.1 types-python-dateutil-2.8.19.4 types-pytz-2022.6.0.1 types-setuptools-65.6.0.2 types-toml-0.10.8.1 typing-extensions-4.4.0 tzdata-2022.7 uri-template-1.2.0 urllib3-1.26.13 versioneer-0.28 virtualenv-20.17.1 wcwidth-0.2.5 webcolors-1.12 webencodings-0.5.1 websocket-client-1.4.2 werkzeug-2.2.2 widgetsnbextension-4.0.4 wrapt-1.14.1 xarray-2022.12.0 xlrd-2.0.1 xlsxwriter-3.0.3 xmltodict-0.13.0 yarl-1.8.2 zipp-3.11.0 zstandard-0.19.0

Or, to paraphrase:

image

Is there a way to speed this up somehow?

mroeschke commented 1 year ago

From a cursory search, maybe a fallout from https://github.com/boto/boto3/issues/3515 https://github.com/boto/boto3/issues/3516?

rhshadrach commented 1 year ago

Edit: Made it far enough in the log to see now that @mroeschke has likely identified the cause

Not familiar with this, could it be an issue?

2022-12-13T17:18:05.5870535Z ##[warning]Failed to restore: Aborting cache download as the download time exceeded the timeout. 2022-12-13T17:18:05.8393423Z pip cache is not found

tpackard1 commented 1 year ago

First time trying to create a development environment according to the docs and am experiencing the same issue with Ubuntu 20.04. @mroeschke cursory search suggested this work around which I tried prior to finding this issue w/ no luck

python3 -m pip install --upgrade pip
Requirement already satisfied: pip in /home/tpackard/virtualenvs/pandas-dev-3.10/lib/python3.10/site-packages (22.3.1)

I also tried to limit the versions of boto3 and botocore as suggested by the pip, which didn't work either.

None of these have helped. As a noob, I have to ask was this not an issue last week? Or is it kind of a niche problem because everyone is using docker or mamba? And maybe if this issue persists this option of using pip to create your development environment should be a last or worst case scenario since it takes so long?

MarcoGorelli commented 1 year ago

Or is it kind of a niche problem because everyone is using docker or mamba

I this is a big part of the issue, yeah, most of the people I've asked are using one of them

MarcoGorelli commented 1 year ago

Locally I tried just removing boto3, botocore, and aiobotocore from requirements-dev.txt, and installation worked fine, it only took about a few minutes

@tpackard1 while this is resolved, to unblock your local development, could you try that? (unless you're running boto-related tests)