pantsbuild / pants

The Pants Build System
https://www.pantsbuild.org
Apache License 2.0
3.27k stars 626 forks source link

Running `./pants -no-pantsd --no-watch-filesystem version` under linux/amd64 docker image on ARM macOS requires ~4G of memory #17622

Open huonw opened 1 year ago

huonw commented 1 year ago

Describe the bug

As discussed in https://pantsbuild.slack.com/archives/C046T6T9U/p1669150590877779, it seems pants doesn't easily run under docker with a ARM macOS host:

uname -ms 
# Darwin arm64

docker pull --platform linux/amd64 python@sha256:c1613835d7be322f98603f356b9e0c9d40f9589e94dc9f710e714a807a665700  # 3.9 at the time of writing
# ...

docker run -t --platform linux/amd64 python@sha256:c1613835d7be322f98603f356b9e0c9d40f9589e94dc9f710e714a807a665700 bash -c 'echo -e "[GLOBAL]\npants_version = \"2.14.0\"" > pants.toml && curl -L -O https://static.pantsbuild.org/setup/pants && chmod +x ./pants && ./pants -ldebug --no-pantsd --no-watch-filesystem  version'
# ...
# Bootstrapping Pants using /usr/local/bin/python3.9
# ...
# New virtual environment successfully created at /root/.cache/pants/setup/bootstrap-Linux-x86_64/2.14.0_py39.
# 22:53:01.00 [DEBUG] File handle limit is: 1048576
# 22:53:03.05 [DEBUG] Using [cache::CommandRunner { inner: bounded::CommandRunner { inner: nailgun::CommandRunner { inner: local::CommandRunner { .. }, .. }, .. }, .. }, cache::CommandRunner { inner: bounded::CommandRunner { inner: nailgun::CommandRunner { inner: local::CommandRunner { .. }, .. }, .. }, .. }] for process execution.
# 22:53:04.12 [DEBUG] Launching 1 roots (poll=false).
# 22:53:04.26 [DEBUG] computed 1 nodes in 0.136536 seconds. there are 7 total nodes.
# 22:53:04.51 [DEBUG] File handle limit is: 1048576
# 22:53:06.61 [DEBUG] Using [cache::CommandRunner { inner: bounded::CommandRunner { inner: nailgun::CommandRunner { inner: local::CommandRunner { .. }, .. }, .. }, .. }, cache::CommandRunner { inner: bounded::CommandRunner { inner: nailgun::CommandRunner { inner: local::CommandRunner { .. }, .. }, .. }, .. }] for process execution.
# 22:53:06.83 [DEBUG] specs are: Specs(includes=RawSpecs(description_of_origin='CLI arguments', address_literals=(), file_literals=(), file_globs=(), dir_literals=(), dir_globs=(), recursive_globs=(), ancestor_globs=(), unmatched_glob_behavior=<GlobMatchErrorBehavior.error: 'error'>, filter_by_global_options=True, from_change_detection=False), ignores=RawSpecs(description_of_origin='CLI arguments', address_literals=(), file_literals=(), file_globs=(), dir_literals=(), dir_globs=(), recursive_globs=(), ancestor_globs=(), unmatched_glob_behavior=<GlobMatchErrorBehavior.error: 'error'>, filter_by_global_options=False, from_change_detection=False))
# 22:53:06.83 [DEBUG] changed_options are: ChangedOptions(since=None, diffspec=None, dependees=<DependeesOption.NONE: 'none'>)
# 22:53:06.83 [DEBUG] Launching 1 roots (poll=false).
# 22:53:06.85 [WARN] Please either set `enabled = true` in the [anonymous-telemetry] section of pants.toml to enable sending anonymous stats to the Pants project to aid development, or set `enabled = false` to disable it. No telemetry sent for this run. An explicit setting will get rid of this message. See https://www.pantsbuild.org/v2.14/docs/anonymous-telemetry for details.
# 22:53:06.86 [DEBUG] computed 1 nodes in 0.024304 seconds. there are 12 total nodes.
# 2.14.0

The --no-pantsd --no-watch-filesystem are prompted by https://www.pantsbuild.org/docs/prerequisites#macos.

Watching top (e.g. docker ps to find the running container name, followed by docker exec -t ...name... top) indicates that the memory usage of the pants python process is reasonable for a while, and then starts climbing to resident ~3.6G and virtual ~600G before succeeding. When running in a container with a lower memory limit (e.g. 3G), the server is killed instead (after the File handle limit is: 1048576 log line that's at 22:53:04.51 in the quote above).

Pants version 2.14.0

OS macOS

Additional info N/A

stuhood commented 1 year ago

The virtual memory aspect is due to LMDB, which mmaps its entire storage.

But 2.5G of resident memory is very surprising… I see about 80-140MB outside of docker.

huonw commented 1 year ago

I did a little bit more investigation:

Here's the resident memory tracking running under memray:

image

Corresponding logs, plus my annotations for where the spikes occur, based on timestamps (##).

03:46:34.36 [DEBUG] File handle limit is: 1048576
03:46:34.39 [TRACE] Initializing ShardedLmdb at root "/root/.cache/pants/lmdb_store/files"

## first large spike starts

03:46:37.38 [TRACE] Initializing ShardedLmdb at root "/root/.cache/pants/lmdb_store/directories"
03:46:37.81 [TRACE] Initializing ShardedLmdb at root "/root/.cache/pants/lmdb_store/cache"
03:46:38.34 [DEBUG] Using [cache::CommandRunner { inner: bounded::CommandRunner { inner: nailgun::CommandRunner { inner: local::CommandRunner { .. }, .. }, .. }, .. }, cache::CommandRunner { inner: bounded::CommandRunner { inner: nailgun::CommandRunner { inner: local::CommandRunner { .. }, .. }, .. }, .. }] for process execution.

## first large spike ends

03:46:41.90 [TRACE] initial_polymorphic iteration 1000: 1356 nodes
03:46:44.60 [TRACE] initial_polymorphic iteration 2000: 2304 nodes
03:46:47.45 [TRACE] initial_polymorphic iteration 3000: 3292 nodes
03:46:50.46 [TRACE] initial_polymorphic iteration 4000: 4294 nodes
03:46:53.39 [TRACE] initial_polymorphic iteration 5000: 5298 nodes
03:46:56.25 [TRACE] initial_polymorphic iteration 6000: 6282 nodes
03:46:59.30 [TRACE] initial_polymorphic iteration 7000: 7269 nodes
03:47:02.17 [TRACE] initial_polymorphic iteration 8000: 8285 nodes
03:47:04.98 [TRACE] initial_polymorphic iteration 9000: 9279 nodes
03:47:07.87 [TRACE] initial_polymorphic iteration 10000: 10277 nodes
03:47:10.76 [TRACE] initial_polymorphic iteration 11000: 11308 nodes
03:47:13.53 [TRACE] initial_polymorphic iteration 12000: 12293 nodes
03:47:17.06 [TRACE] initial_polymorphic iteration 13000: 13277 nodes
03:47:21.06 [TRACE] initial_polymorphic iteration 14000: 14251 nodes
03:47:24.62 [TRACE] initial_polymorphic iteration 15000: 15239 nodes
03:47:27.64 [TRACE] initial_polymorphic iteration 16000: 16209 nodes
03:47:30.56 [TRACE] initial_polymorphic iteration 17000: 17207 nodes
03:48:28.75 [TRACE] rule_graph monomorphize: iteration 1000: live: 10942, minimal: 426, to_visit: 16792, total: 17726
03:48:38.98 [TRACE] rule_graph monomorphize: iteration 2000: live: 10790, minimal: 527, to_visit: 16429, total: 17941
03:48:52.21 [TRACE] rule_graph monomorphize: iteration 3000: live: 10747, minimal: 819, to_visit: 15979, total: 18219
03:49:02.98 [TRACE] rule_graph monomorphize: iteration 4000: live: 10605, minimal: 922, to_visit: 15601, total: 18427
03:49:14.39 [TRACE] rule_graph monomorphize: iteration 5000: live: 10450, minimal: 1051, to_visit: 15131, total: 18655
03:49:26.62 [TRACE] rule_graph monomorphize: iteration 6000: live: 10370, minimal: 1312, to_visit: 14619, total: 18921
03:49:38.30 [TRACE] rule_graph monomorphize: iteration 7000: live: 10303, minimal: 1580, to_visit: 14122, total: 19189
03:49:52.20 [TRACE] rule_graph monomorphize: iteration 8000: live: 10174, minimal: 1751, to_visit: 13616, total: 19438
03:50:01.27 [TRACE] rule_graph monomorphize: iteration 9000: live: 9959, minimal: 1793, to_visit: 13164, total: 19668
03:50:11.98 [TRACE] rule_graph monomorphize: iteration 10000: live: 9926, minimal: 2116, to_visit: 12479, total: 20003
03:50:23.77 [TRACE] rule_graph monomorphize: iteration 11000: live: 9846, minimal: 2367, to_visit: 11903, total: 20294
03:50:34.48 [TRACE] rule_graph monomorphize: iteration 12000: live: 9761, minimal: 2589, to_visit: 11433, total: 20551
03:50:44.30 [TRACE] rule_graph monomorphize: iteration 13000: live: 9602, minimal: 2684, to_visit: 10907, total: 20810
03:50:53.31 [TRACE] rule_graph monomorphize: iteration 14000: live: 9447, minimal: 2841, to_visit: 10409, total: 21063
03:51:04.32 [TRACE] rule_graph monomorphize: iteration 15000: live: 9364, minimal: 3061, to_visit: 9893, total: 21329
03:51:14.12 [TRACE] rule_graph monomorphize: iteration 16000: live: 9264, minimal: 3251, to_visit: 9358, total: 21615
03:51:24.39 [TRACE] rule_graph monomorphize: iteration 17000: live: 9182, minimal: 3489, to_visit: 8710, total: 21928
03:51:38.30 [TRACE] rule_graph monomorphize: iteration 18000: live: 9059, minimal: 3637, to_visit: 8306, total: 22159
03:51:51.74 [TRACE] rule_graph monomorphize: iteration 19000: live: 8985, minimal: 3874, to_visit: 7828, total: 22415
03:52:02.97 [TRACE] rule_graph monomorphize: iteration 20000: live: 8918, minimal: 4123, to_visit: 7226, total: 22720
03:52:12.99 [TRACE] rule_graph monomorphize: iteration 21000: live: 8784, minimal: 4255, to_visit: 6788, total: 22961
03:52:25.08 [TRACE] rule_graph monomorphize: iteration 22000: live: 8681, minimal: 4402, to_visit: 6252, total: 23232
03:52:36.01 [TRACE] rule_graph monomorphize: iteration 23000: live: 8628, minimal: 4696, to_visit: 5539, total: 23556
03:52:46.39 [TRACE] rule_graph monomorphize: iteration 24000: live: 8507, minimal: 4854, to_visit: 5101, total: 23798
03:52:56.57 [TRACE] rule_graph monomorphize: iteration 25000: live: 8440, minimal: 5114, to_visit: 4492, total: 24108
03:53:05.70 [TRACE] rule_graph monomorphize: iteration 26000: live: 8246, minimal: 5192, to_visit: 3982, total: 24344
03:53:17.88 [TRACE] rule_graph monomorphize: iteration 27000: live: 8195, minimal: 5488, to_visit: 3336, total: 24670
03:53:29.73 [TRACE] rule_graph monomorphize: iteration 28000: live: 8103, minimal: 5697, to_visit: 2841, total: 24933
03:53:38.84 [TRACE] rule_graph monomorphize: iteration 29000: live: 7852, minimal: 5686, to_visit: 2404, total: 25146
03:53:49.56 [TRACE] rule_graph monomorphize: iteration 30000: live: 7695, minimal: 5857, to_visit: 1855, total: 25420
03:54:03.91 [TRACE] rule_graph monomorphize: iteration 31000: live: 7627, minimal: 6120, to_visit: 1243, total: 25722
03:54:16.28 [TRACE] rule_graph monomorphize: iteration 32000: live: 7441, minimal: 6181, to_visit: 856, total: 25942
03:54:25.78 [TRACE] rule_graph monomorphize: iteration 33000: live: 7297, minimal: 6352, to_visit: 190, total: 26228
03:54:44.87 [DEBUG] Launching 1 roots (poll=false).
03:54:45.46 [DEBUG] computed 1 nodes in 0.591910 seconds. there are 7 total nodes.
03:54:46.91 [DEBUG] File handle limit is: 1048576
03:54:46.93 [TRACE] Initializing ShardedLmdb at root "/root/.cache/pants/lmdb_store/files"

## second large spike starts

03:54:49.54 [TRACE] Initializing ShardedLmdb at root "/root/.cache/pants/lmdb_store/directories"
03:54:49.88 [TRACE] Initializing ShardedLmdb at root "/root/.cache/pants/lmdb_store/cache"
03:54:50.23 [DEBUG] Using [cache::CommandRunner { inner: bounded::CommandRunner { inner: nailgun::CommandRunner { inner: local::CommandRunner { .. }, .. }, .. }, .. }, cache::CommandRunner { inner: bounded::CommandRunner { inner: nailgun::CommandRunner { inner: local::CommandRunner { .. }, .. }, .. }, .. }] for process execution.

## second large spike ends

03:54:53.46 [TRACE] initial_polymorphic iteration 1000: 1174 nodes
03:54:56.32 [TRACE] initial_polymorphic iteration 2000: 2166 nodes
03:54:58.98 [TRACE] initial_polymorphic iteration 3000: 3108 nodes

## first small drop

03:55:21.30 [TRACE] rule_graph monomorphize: iteration 1000: live: 2588, minimal: 397, to_visit: 3226, total: 4122
03:55:30.48 [TRACE] rule_graph monomorphize: iteration 2000: live: 2449, minimal: 532, to_visit: 2825, total: 4351
03:55:41.32 [TRACE] rule_graph monomorphize: iteration 3000: live: 2380, minimal: 786, to_visit: 2386, total: 4621
03:55:51.40 [TRACE] rule_graph monomorphize: iteration 4000: live: 2280, minimal: 953, to_visit: 2008, total: 4849
03:56:00.88 [TRACE] rule_graph monomorphize: iteration 5000: live: 2109, minimal: 1049, to_visit: 1535, total: 5067
03:56:11.24 [TRACE] rule_graph monomorphize: iteration 6000: live: 2029, minimal: 1310, to_visit: 1034, total: 5329

## second small drop

03:56:21.44 [TRACE] rule_graph monomorphize: iteration 7000: live: 1952, minimal: 1554, to_visit: 525, total: 5594
03:56:34.04 [DEBUG] specs are: Specs(includes=RawSpecs(description_of_origin='CLI arguments', address_literals=(), file_literals=(), file_globs=(), dir_literals=(), dir_globs=(), recursive_globs=(), ancestor_globs=(), unmatched_glob_behavior=<GlobMatchErrorBehavior.error: 'error'>, filter_by_global_options=True, from_change_detection=False), ignores=RawSpecs(description_of_origin='CLI arguments', address_literals=(), file_literals=(), file_globs=(), dir_literals=(), dir_globs=(), recursive_globs=(), ancestor_globs=(), unmatched_glob_behavior=<GlobMatchErrorBehavior.error: 'error'>, filter_by_global_options=False, from_change_detection=False))
03:56:34.05 [DEBUG] changed_options are: ChangedOptions(since=None, diffspec=None, dependees=<DependeesOption.NONE: 'none'>)
03:56:34.07 [DEBUG] Launching 1 roots (poll=false).
03:56:34.19 [TRACE] Completed: pants.init.engine_initializer.EngineInitializer.setup_graph_extended.union_membership_singleton

## last small spike

03:56:38.46 [TRACE] Completed: pants.init.engine_initializer.EngineInitializer.setup_graph_extended.build_configuration_singleton
03:56:38.47 [TRACE] Completed: session_values
03:56:38.61 [TRACE] Completed: pants.engine.internals.options_parsing.parse_options
03:56:38.62 [TRACE] Completed: pants.engine.internals.options_parsing.scope_options
03:56:38.62 [TRACE] Completed: pants.engine.internals.options_parsing.scope_options
03:56:38.64 [TRACE] Completed: construct_scope_anonymous_telemetry
03:56:38.64 [TRACE] Completed: construct_scope_stats
03:56:38.66 [WARN] Please either set `enabled = true` in the [anonymous-telemetry] section of pants.toml to enable sending anonymous stats to the Pants project to aid development, or set `enabled = false` to disable it. No telemetry sent for this run. An explicit setting will get rid of this message. See https://www.pantsbuild.org/v2.14/docs/anonymous-telemetry for details.
03:56:38.68 [TRACE] Completed: pants.goal.anonymous_telemetry.construct_callback
03:56:38.69 [WARN] Please run with `--plugins=hdrhistogram` if you would like histogram summaries to be shown at the end of the run, or permanently add `[GLOBAL].plugins = ['hdrhistogram']`. This will cause Pants to install the `hdrhistogram` dependency from PyPI.
03:56:38.70 [TRACE] Completed: pants.goal.stats_aggregator.construct_callback
03:56:38.72 [TRACE] Completed: pants.engine.streaming_workunit_handler.construct_workunits_callback_factories
03:56:38.74 [TRACE] Completed: select
03:56:38.81 [DEBUG] computed 1 nodes in 4.730668 seconds. there are 12 total nodes.
2.14.0
03:56:39.06 [DEBUG] Async completion is disabled: waiting for workunit callbacks to complete...

Details: https://gist.github.com/huonw/8ecde888ac3b9f7ae2ccd7f39320770d

Theoretically memray should be able to give more precise info about where the allocations are, but I couldn't convince it to work in docker, e.g. installing python3-dbg to get symbols just lead to Failed to parse allocation records in /code/memray-output.bin Reason: Could not open file /proc/self/fd/3: No such file or directory 🤷‍♂️

rafrafek commented 1 year ago

I was trying to stop pantsd daemon and found this issue.

I've copied flag -no-pantsd from the issue title and it doesn't work.

Edit:

--no-pantsd is the correct one. Running -no-pantsd help starts the daemon without any warning about using wrong flag.