pantsbuild / pants

The Pants Build System
https://www.pantsbuild.org
Apache License 2.0
3.25k stars 625 forks source link

Placeholder: Various (potentially NFS or LMDB) filesystem-related bugs #14599

Open jsirois opened 2 years ago

jsirois commented 2 years ago

There may be multiple unrelated problems here or they all may be related. I'm collecting them all for now and will break out seperate tickets if needed as more is learned.


Some are encountered during the setup script execution in CI:

18:30:01 SHA256 fingerprint of file:///var/tmp/pjenkslv/jenkins/workspace/casper-codetree-release-lib-ANY/casper/codetree/codetree/src/pantsbuild/pex verified.
18:30:01 Preparinng bootstrap with initial requirements
18:30:01 /ms/dist/python/PROJ/core/3.7.5/exec/bin/python: can't find '__main__' module in '/var/tmp/emmdev/.pex/unzipped_pexes/478cc1fa371ca40aa3e7dafee735ca438d4a243f'

Or:

22:56:37 SHA256 fingerprint of file:///var/tmp/pjenkslv/jenkins/workspace/casper-codetree-release-lib-ANY/casper/codetree/codetree/src/pantsbuild/pex verified.
22:56:37 Preparinng bootstrap with initial requirements
22:56:37 PEX PATH: /var/tmp/emmdev/.cache/pants/setup/bootstrap-Linux-x86_64/pex-2.1.42/pex
22:56:37 /var/tmp/emmdev/.cache/pants/setup/bootstrap-Linux-x86_64/pex-2.1.42/pex
22:57:48 /var/tmp/emmdev/.pex/unzipped_pexes/478cc1fa371ca40aa3e7dafee735ca438d4a243f/.deps/pex-2.1.42-py2.py3-none-any.whl/pex/tools/commands/venv.py:141: PEXWarning: Encountered collision building venv at /var/tmp/emmdev/.pex/venvs/short/c9600bda from /var/tmp/emmdev/.pex/pip.pex/46820cb5af0dcf9295a4e7f30184cc0e9fa063dc:
22:57:48 1. /var/tmp/emmdev/.pex/venvs/720739c5b08326cc23c9ac0b68c11307ad60aca3/1fd650467e13c9fc5e0f7b7915a685aa6aec963f.02df45dac6084a0a97a4934629166fe7/lib/python3.7/site-packages/constraints.txt was provided by:
22:57:48  /var/tmp/emmdev/.pex/pip.pex/46820cb5af0dcf9295a4e7f30184cc0e9fa063dc/.deps/setuptools/constraints.txt
22:57:48  /var/tmp/emmdev/.pex/pip.pex/46820cb5af0dcf9295a4e7f30184cc0e9fa063dc/.deps/wheel/constraints.txt
22:57:48   pex_warnings.warn(message)
22:57:48 Installing pantsbuild.pants==2.9.0 into a virtual environment at /var/tmp/emmdev/.cache/pants/setup/bootstrap-Linux-x86_64/2.9.0_py37
22:57:50 created virtual environment CPython3.7.5.final.0-64 in 189ms
22:57:50   creator CPython3Posix(dest=/var/tmp/emmdev/.cache/pants/setup/bootstrap-Linux-x86_64/pants.BVOx5n/install, clear=False, no_vcs_ignore=False, global=False)
22:57:50   seeder FromAppData(download=False, pip=bundle, setuptools=bundle, wheel=bundle, via=copy, app_data_dir=/var/tmp/emmdev/.local/share/virtualenv)
22:57:50     added seed packages: pip==21.1.2, setuptools==57.0.0, wheel==0.36.2
22:57:50   activators BashActivator,CShellActivator,FishActivator,PowerShellActivator,PythonActivator,XonshActivator
22:57:50 ./pants: line 364: /var/tmp/emmdev/.cache/pants/setup/bootstrap-Linux-x86_64/pants.BVOx5n/install/bin/pip: No such file or directory

Some are encountered packaging PEX binaries:

19:40:07 /v/campus/ny/cs/casper/taymarti/emmdev/.cache/pants/setup/bootstrap-Linux-x86_64/2.9.0_py37
19:40:07 /v/campus/ny/cs/casper/taymarti/emmdev/.cache/pants/setup/bootstrap-Linux-x86_64/2.9.0_py37/bin/python
19:40:07 /v/campus/ny/cs/casper/taymarti/emmdev/.cache/pants/setup/bootstrap-Linux-x86_64/2.9.0_py37/bin/pants
19:40:07 /v/campus/ny/cs/casper/taymarti/emmdev/.cache/pants/setup/bootstrap-Linux-x86_64/2.9.0_py37/bin/python /v/campus/ny/cs/casper/taymarti/emmdev/.cache/pants/setup/bootstrap-Linux-x86_64/2.9.0_py37/bin/pants --pants-bin-name=./pants --pants-version=2.9.0 package ./proj/libs/clients/**
19:40:07 lrwxrwxrwx 1 emmdev ir_share 11 Feb 23 19:01 /a/stor118ncs2.new-york.ms.com/sc25317/s122015/taymarti/emmdev/.cache/pants/named_caches/pex_root/venvs/1d932cd3e82057d61e57c365b177aad9b535724c/9a128dacefb3843fa45de2c0dc225c7ee1cb4d0e/pex -> __main__.py
19:40:07 lrwxrwxrwx 1 emmdev ir_share 11 Feb 23 19:01 /v/campus/ny/cs/casper/taymarti/emmdev/.cache/pants/named_caches/pex_root/venvs/1d932cd3e82057d61e57c365b177aad9b535724c/9a128dacefb3843fa45de2c0dc225c7ee1cb4d0e/pex -> __main__.py
19:40:07 19:00:15.11 [INFO] Starting: Building build_backend.pex from setuptools_default_lockfile.txt
19:41:31 19:01:31.87 [INFO] Long running tasks:
19:41:31  76.76s  Building build_backend.pex from setuptools_default_lockfile.txt
19:41:36 19:01:36.06 [INFO] Completed: Building build_backend.pex from setuptools_default_lockfile.txt
19:41:36 19:01:36.14 [ERROR] 1 Exception encountered:
19:41:36
19:41:36  ProcessExecutionFailure: Process 'Building build_backend.pex from setuptools_default_lockfile.txt' failed with exit code 1.
19:41:36 stdout:
19:41:36
19:41:36 stderr:
19:41:36 Failed to spawn a job for DistributionTarget(interpreter=PythonInterpreter('/ms/dist/python/PROJ/core/3.7.5-0/.exec/@sys/bin/python3.7', PythonIdentity('/ms/dist/python/PROJ/core/3.7.5-0/.exec/@sys/bin/python3.7', 'cp37', 'cp37m', 'manylinux_2_17_x86_64', (3, 7, 5)))): [Errno 2] No such file or directory: '/a/stor118ncs2.new-york.ms.com/sc25317/s122015/taymarti/emmdev/.cache/pants/named_caches/pex_root/venvs/1d932cd3e82057d61e57c365b177aad9b535724c/9a128dacefb3843fa45de2c0dc225c7ee1cb4d0e/pex': '/a/stor118ncs2.new-york.ms.com/sc25317/s122015/taymarti/emmdev/.cache/pants/named_caches/pex_root/venvs/1d932cd3e82057d61e57c365b177aad9b535724c/9a128dacefb3843fa45de2c0dc225c7ee1cb4d0e/pex'
19:41:36
19:41:36
19:41:36
19:41:36 Use `--no-process-cleanup` to preserve process chroots for inspection.

Or:

09:58:23 09:58:22.22 [ERROR] 1 Exception encountered:
09:58:23
09:58:23   Exception: Snapshot failed: Error storing Digest { hash: Fingerprint<827b11f107d720685bb9b013fafc76f126779197f63f9a831300554e714949ec>, size_bytes: 77 }: MDB_CURSOR_FULL: Internal error - cursor stack limit reached
jsirois commented 2 years ago

For the last, it appears LMDB is a no-go on NFS, which makes sense. See "Caveats" here: http://www.lmdb.tech/doc/. In particular the ...when several processes can use a database concurrently: section:

Unofficial opinining on this here: https://news.ycombinator.com/item?id=18414124

cczona commented 2 years ago

Should we add a mention to Pants docs about this caveat?

jsirois commented 2 years ago

Yeah. Once we have a clue what we're talking about - I don't yet - we should chart the dragons in the waters. We're definitely still in the hand waving phase though.

stuhood commented 2 years ago

13401 is an LMDB related limit, rather than a filesystem issue per-se. But linking it here since we don't have a bug category for this.

stuhood commented 2 years ago

@thejcannon also encountered a very large directory entry, which due to https://www.pantsbuild.org/docs/reference-global#section-local-store-directories-max-size-bytes and https://www.pantsbuild.org/docs/reference-global#section-local-store-shard-count caused a single shard to hit the 16GB/16=1GB limit (while the rest remained in the ~2MB range).