mamba-org / boa

The fast conda package builder, based on mamba
https://boa-build.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
255 stars 56 forks source link

mambabuild: Reloading output folder, memory consumption, almost never ending and KeyError. #213

Open Thomas-Z opened 3 years ago

Thomas-Z commented 3 years ago

Hi,

I'm not sure what causes this problem and how to analyze it but here it is.

This problem occurs with:

Replacing conda mambabuild with conda build does work in the very same environment. -> Took ~10 minutes and used up to ~5GB of RAM.

Downgrading to the following version does work:

-> Took ~25 minutes and used up to ~3GB of RAM.

We encounter this problem in many different internal projects (so we're running with old boa/mamba version for now) and I think (not 100% sure) it does only fail when using a custom internal channel.

The following output comes from a gitlab-ci run.

docker image: continuumio/miniconda3:latest
conda config --env --set conda_build.pkg_format 2
conda config --set custom_channels.chan1 http://packages.xxx.xx/conda
conda config --set custom_channels.chan2 http://packages.xxx.xx/conda

conda install 'mamba>=0.17' -c conda-forge -y
mamba install 'boa>=0.7.1' -c conda-forge -y

export PYTHON=3.9
export CONDA_BUILT_PACKAGES=build/conda_packages
export OCTANT_CHANNEL=chan2
$ conda mambabuild --debug --python=$PYTHON --output-folder $CONDA_BUILT_PACKAGES --override-channels -c $OCTANT_CHANNEL -c conda-forge conda_recipe/
DEBUG:conda_build.index:found subdirs set()
Updating build index: /builds/xMr4zdhB/1/casys/casys-toolbox/build/conda_packages
Building repodata for /builds/xMr4zdhB/1/casys/casys-toolbox/build/conda_packages/noarch
INFO:conda_build.index:Building repodata for /builds/xMr4zdhB/1/casys/casys-toolbox/build/conda_packages/noarch
No numpy version specified in conda_build_config.yaml.  Falling back to default numpy value of 1.16
WARNING:conda_build.metadata:No numpy version specified in conda_build_config.yaml.  Falling back to default numpy value of 1.16
Adding in variants from internal_defaults
INFO:conda_build.variants:Adding in variants from internal_defaults
Attempting to finalize metadata for casys.nadir
INFO:conda_build.metadata:Attempting to finalize metadata for casys.nadir
conda-forge/linux-64     Using cache
conda-forge/noarch       Using cache
pkgs/main/linux-64       Using cache
pkgs/main/noarch         Using cache
pkgs/r/linux-64          Using cache
pkgs/r/noarch            Using cache
Reloading output folder: 
/builds/xMr4zdhB/1/casys/casys-toolbox/build/conda_packages
DEBUG:conda_build.index:found subdirs {'linux-64', 'noarch'}
Building repodata for /builds/xMr4zdhB/1/casys/casys-toolbox/build/conda_packages/linux-64
INFO:conda_build.index:Building repodata for /builds/xMr4zdhB/1/casys/casys-toolbox/build/conda_packages/linux-64
Reloading output folder: 
/builds/xMr4zdhB/1/casys/casys-toolbox/build/conda_packages
BUILD START: ['casys.nadir-0.5-py39_20211112114944.conda']
Reloading output folder: 
/builds/xMr4zdhB/1/casys/casys-toolbox/build/conda_packages
Reloading output folder: 
/builds/xMr4zdhB/1/casys/casys-toolbox/build/conda_packages
ERROR: Job failed: execution took longer than 6h0m0s seconds

I don't know if the debug's logs are reliable but it looks like the problem comes from the Reloading output folder (found here). It got stuck there for 6 hours. Sometime it manages to finish the build part and get stuck on the same Reloading output folder during the testing part.

I ran it locally to get some more information and the result is the same. While it's stuck the CPU usage stays at 100% and the memory consumption constantly increase (more than 10GB after ~45minutes).

I managed to get 2 different error outputs.

The first one is related to the memory (not sure at what stage it occurred, I did not have enough logs left):

INFO conda.core.link:_reverse_actions(790): ===> REVERSING PACKAGE LINK: defaults::blas-1.0-mkl <===
  prefix=/opt/conda/conda-bld/casys.nadir_1636557591162/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placeh

INFO:conda.core.link:===> REVERSING PACKAGE LINK: conda-forge::_libgcc_mutex-0.1-conda_forge <===
  prefix=/opt/conda/conda-bld/casys.nadir_1636557591162/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placeh

INFO conda.core.link:_reverse_actions(790): ===> REVERSING PACKAGE LINK: conda-forge::_libgcc_mutex-0.1-conda_forge <===
  prefix=/opt/conda/conda-bld/casys.nadir_1636557591162/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placeh

done
DEBUG:conda.common.signals:de-registering handler for Signals.SIGABRT
DEBUG conda.common.signals:signal_handler(58): de-registering handler for Signals.SIGABRT
DEBUG:conda.common.signals:de-registering handler for Signals.SIGINT
DEBUG conda.common.signals:signal_handler(58): de-registering handler for Signals.SIGINT
DEBUG:conda.common.signals:de-registering handler for Signals.SIGTERM
DEBUG conda.common.signals:signal_handler(58): de-registering handler for Signals.SIGTERM
DEBUG:conda.common.signals:de-registering handler for Signals.SIGQUIT
DEBUG conda.common.signals:signal_handler(58): de-registering handler for Signals.SIGQUIT
Traceback (most recent call last):
  File "/opt/conda/bin/conda-mambabuild", line 11, in <module>
    sys.exit(main())
  File "/opt/conda/lib/python3.9/site-packages/boa/cli/mambabuild.py", line 142, in main
    call_conda_build(action, config)
  File "/opt/conda/lib/python3.9/site-packages/boa/cli/mambabuild.py", line 120, in call_conda_build
    result = api.build(
  File "/opt/conda/lib/python3.9/site-packages/conda_build/api.py", line 186, in build
    return build_tree(
  File "/opt/conda/lib/python3.9/site-packages/conda_build/build.py", line 3094, in build_tree
    test(pkg, config=metadata.config.copy(), stats=stats)
  File "/opt/conda/lib/python3.9/site-packages/conda_build/build.py", line 2884, in test
    environ.create_env(metadata.config.test_prefix, actions, config=metadata.config,
  File "/opt/conda/lib/python3.9/site-packages/conda_build/environ.py", line 910, in create_env
    execute_actions(actions, index)
  File "/opt/conda/lib/python3.9/site-packages/conda/common/io.py", line 88, in decorated
    return f(*args, **kwds)
  File "/opt/conda/lib/python3.9/site-packages/conda/plan.py", line 321, in execute_actions
    execute_instructions(plan, index, verbose)
  File "/opt/conda/lib/python3.9/site-packages/conda/plan.py", line 533, in execute_instructions
    cmd(state, arg)
  File "/opt/conda/lib/python3.9/site-packages/conda/instructions.py", line 73, in UNLINKLINKTRANSACTION_CMD
    unlink_link_transaction.execute()
  File "/opt/conda/lib/python3.9/site-packages/conda/core/link.py", line 250, in execute
    self._execute(tuple(concat(interleave(itervalues(self.prefix_action_groups)))))
  File "/opt/conda/lib/python3.9/site-packages/conda/core/link.py", line 713, in _execute
    raise CondaMultiError(tuple(concatv(
conda.CondaMultiError: [Errno 12] Cannot allocate memory

The second one is an error I've seen regularly but never figured out where it came from (could it be related to a memory problem?):

Traceback (most recent call last):
  File "/opt/conda/bin/conda-mambabuild", line 11, in <module>
    sys.exit(main())
  File "/opt/conda/lib/python3.9/site-packages/boa/cli/mambabuild.py", line 142, in main
    call_conda_build(action, config)
  File "/opt/conda/lib/python3.9/site-packages/boa/cli/mambabuild.py", line 120, in call_conda_build
    result = api.build(
  File "/opt/conda/lib/python3.9/site-packages/conda_build/api.py", line 186, in build
    return build_tree(
  File "/opt/conda/lib/python3.9/site-packages/conda_build/build.py", line 3094, in build_tree
    test(pkg, config=metadata.config.copy(), stats=stats)
  File "/opt/conda/lib/python3.9/site-packages/conda_build/build.py", line 2884, in test
    environ.create_env(metadata.config.test_prefix, actions, config=metadata.config,
  File "/opt/conda/lib/python3.9/site-packages/conda_build/environ.py", line 904, in create_env
    display_actions(actions, index)
  File "/opt/conda/lib/python3.9/site-packages/conda/exports.py", line 236, in display_actions
    actions['LINK'] = [index[d] for d in actions['LINK']]
  File "/opt/conda/lib/python3.9/site-packages/conda/exports.py", line 236, in <listcomp>
    actions['LINK'] = [index[d] for d in actions['LINK']]
KeyError: Dist(channel='conda-forge', dist_name='pyinterp-0.9.2-mkl_py39h6cb4401_0', name='pyinterp', fmt='.tar.bz2', version='0.9.2', build_string='mkl_py39h6cb4401_0', build_number=0, base_url=None, platform=None)

Usually the KeyError is a package from our internal conda channel but, interestingly enough, in this case it was not.

I know I do not provide a lot of information here but I hope it might be sufficient to understand what's going wrong.

Thanks!

wolfv commented 2 years ago

@Thomas-Z is it possible for you to remove the defaults channel?

wolfv commented 2 years ago

Or enable strict channel priority?

wolfv commented 2 years ago

Also can you share the meta.yaml with only dependencies that should come from defaults / conda-forge? Maybe that would be enough to reproduce the issue?

Thomas-Z commented 2 years ago

@Thomas-Z is it possible for you to remove the defaults channel?

I'm usually using the following options: --override-channels -c $OCTANT_CHANNEL -c conda-forge so the defaults channel is not used (It might have been used to produce the second log but that was not my default configuration).

Also can you share the meta.yaml with only dependencies that should come from defaults / conda-forge? Maybe that would be enough to reproduce the issue?

I will work on a reproducible example (creating fake empty packages containing only dependencies) but I won't have time for this before January.

Thanks

Thomas-Z commented 2 years ago

I did not have time to create a reproducible example but I have an additional element regarding the KeyError problem.

When using mamba build I have the following debug message just before the error (package name may vary):

DEBUG:urllib3.connectionpool:http://abc.xx.yyy.fr/ "GET /conda/octantdev/linux-64/pygsl-2.3.0.1-py39h6f358a3_15.tar.bz2 HTTP/1.1" 200 1010495

Whereas using conda build generate the following message (which do not result in an error):

DEBUG:urllib3.connectionpool:http://proxy.yyy.fr:8080/ "GET http://abc.xx.yyy.fr/conda/octantdev/linux-64/pygsl-2.3.0.1-py39h6f358a3_15.tar.bz2 HTTP/1.1" 200 1010495

Could it be related to proxy information not being correcly passed when using mamba build ?

Full error looks like this:

DEBUG:urllib3.connectionpool:http://abc.xx.yyy.fr/ "GET /conda/octantdev/linux-64/pygsl-2.3.0.1-py39h6f358a3_15.tar.bz2 HTTP/1.1" 200 1010495
Traceback (most recent call last):
  File "/builds/octantng/tox_workdir/py39-build/bin/conda-mambabuild", line 10, in <module>
    sys.exit(main())
  File "/builds/octantng/tox_workdir/py39-build/lib/python3.9/site-packages/boa/cli/mambabuild.py", line 239, in main
    call_conda_build(action, config)
  File "/builds/octantng/tox_workdir/py39-build/lib/python3.9/site-packages/boa/cli/mambabuild.py", line 211, in call_conda_build
    result = api.build(
  File "/builds/octantng/tox_workdir/py39-build/lib/python3.9/site-packages/conda_build/api.py", line 186, in build
    return build_tree(
  File "/builds/octantng/tox_workdir/py39-build/lib/python3.9/site-packages/conda_build/build.py", line 3088, in build_tree
    packages_from_this = build(metadata, stats,
  File "/builds/octantng/tox_workdir/py39-build/lib/python3.9/site-packages/conda_build/build.py", line 2051, in build
    output_metas = expand_outputs([(m, need_source_download, need_reparse_in_env)])
  File "/builds/octantng/tox_workdir/py39-build/lib/python3.9/site-packages/conda_build/render.py", line 788, in expand_outputs
    for (output_dict, m) in deepcopy(_m).get_output_metadata_set(permit_unsatisfiable_variants=False):
  File "/builds/octantng/tox_workdir/py39-build/lib/python3.9/site-packages/conda_build/metadata.py", line 2120, in get_output_metadata_set
    conda_packages = finalize_outputs_pass(ref_metadata, conda_packages, pass_no=0,
  File "/builds/octantng/tox_workdir/py39-build/lib/python3.9/site-packages/conda_build/metadata.py", line 781, in finalize_outputs_pass
    fm = finalize_metadata(om, parent_metadata=parent_metadata,
  File "/builds/octantng/tox_workdir/py39-build/lib/python3.9/site-packages/conda_build/render.py", line 546, in finalize_metadata
    build_unsat, host_unsat = add_upstream_pins(m,
  File "/builds/octantng/tox_workdir/py39-build/lib/python3.9/site-packages/conda_build/render.py", line 408, in add_upstream_pins
    host_deps, host_unsat, extra_run_specs_from_host = _read_upstream_pin_files(m, 'host',
  File "/builds/octantng/tox_workdir/py39-build/lib/python3.9/site-packages/conda_build/render.py", line 378, in _read_upstream_pin_files
    extra_run_specs = get_upstream_pins(m, actions, env)
  File "/builds/octantng/tox_workdir/py39-build/lib/python3.9/site-packages/conda_build/render.py", line 364, in get_upstream_pins
    loc, dist = execute_download_actions(m, actions, env=env, package_subset=pkg)[pkg]
  File "/builds/octantng/tox_workdir/py39-build/lib/python3.9/site-packages/conda_build/render.py", line 331, in execute_download_actions
    _loc = os.path.join(pkg_dir, index[pkg].fn)
KeyError: Dist(channel='http://abc.xx.yyy.fr/conda/octantdev', dist_name='pygsl-2.3.0.1-py39h6f358a3_15', name='pygsl', fmt='.tar.bz2', version='2.3.0.1', build_string='py39h6f358a3_15', build_number=15, base_url=None, platform=None)
struktured commented 2 years ago

I've also run into this with

Also am using the -c argument when building.

conda build does not exhibit the problem, just conda mambabuild.

bryango commented 1 year ago

I also encounter never ending KeyError: Dist(channel=..., ...). I am using a custom channel mirror. The error disappears when I temporarily remove the custom channel url:

diff a/.condarc b/.condarc
-custom_channels:
-  conda-forge: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud

The custom channel works fine for mamba install. The error only appears in conda mambabuild.