simonsobs / soconda

Simons Observatory Conda Tools
BSD 2-Clause "Simplified" License
1 stars 2 forks source link

Missing 'satellite_hwp_solve_amplitudes' key in test case #35

Open BrianJKoopman opened 5 months ago

BrianJKoopman commented 5 months ago

I'm building a docker image with soconda installed in it to run JupyterHub on Kubernetes at the site. After a bit of trial and error yesterday I have an image that I believe is quite close to functional, however I'm running into a failing test case when I run run_tests.sh:

TOAST INFO: MapMaker PCG stalled after   60 iterations and 66.81 s
TOAST INFO: SolveAmplitudes  finished solver in 66.81 s
TOAST INFO: MapMaker  finished template amplitude solve in 70.93 s
TOAST INFO: MapMaker begin build of final binning covariance
TOAST INFO: MapMaker  finished build of final covariance in 0.21 s
TOAST INFO: Wrote /home/jovyan/soconda/soconda/toast_test_output/template_hwpss/satellite_hwp/satellite_hwp_hits.fits in 0.01 s
TOAST INFO: MapMaker begin map binning
TOAST INFO: MapMaker  finished binning in 0.20 s
TOAST INFO: Wrote /home/jovyan/soconda/soconda/toast_test_output/template_hwpss/satellite_hwp/satellite_hwp_binmap.fits in 0.01 s
TOAST INFO: MapMaker begin apply template amplitudes
TOAST INFO: MapMaker  finished apply template amplitudes in 0.26 s
TOAST INFO: MapMaker begin final map binning
TOAST INFO: MapMaker  finished final binning in 0.20 s
TOAST INFO: Wrote /home/jovyan/soconda/soconda/toast_test_output/template_hwpss/satellite_hwp/satellite_hwp_map.fits in 0.01 s
TOAST INFO: MapMaker  finished output write in 0.01 s
Proc 0: Traceback (most recent call last):
Proc 0:   File "/opt/conda/lib/python3.11/unittest/case.py", line 57, in testPartExecutor
    yield
Proc 0:   File "/opt/conda/lib/python3.11/unittest/case.py", line 623, in run
    self._callTestMethod(testMethod)
Proc 0:   File "/opt/conda/lib/python3.11/unittest/case.py", line 579, in _callTestMethod
    if method() is not None:
       ^^^^^^^^
Proc 0:   File "/opt/conda/lib/python3.11/site-packages/toast/tests/template_hwpss.py", line 678, in test_satellite_hwp
    oamps = data[f"{mapper.name}_solve_amplitudes"][offset_tmpl.name]
            ~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Proc 0:   File "/opt/conda/lib/python3.11/site-packages/toast/data.py", line 37, in __getitem__
    return self._internal[key]
           ~~~~~~~~~~~~~~^^^^^
Proc 0: KeyError: 'satellite_hwp_solve_amplitudes'

[0]error Proc 1: Traceback (most recent call last):
Proc 1:   File "/opt/conda/lib/python3.11/unittest/case.py", line 57, in testPartExecutor
    yield
Proc 1:   File "/opt/conda/lib/python3.11/unittest/case.py", line 623, in run
    self._callTestMethod(testMethod)
Proc 1:   File "/opt/conda/lib/python3.11/unittest/case.py", line 579, in _callTestMethod
    if method() is not None:
       ^^^^^^^^
Proc 1:   File "/opt/conda/lib/python3.11/site-packages/toast/tests/template_hwpss.py", line 678, in test_satellite_hwp
    oamps = data[f"{mapper.name}_solve_amplitudes"][offset_tmpl.name]
            ~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Proc 1:   File "/opt/conda/lib/python3.11/site-packages/toast/data.py", line 37, in __getitem__
    return self._internal[key]
           ~~~~~~~~~~~~~~^^^^^
Proc 1: KeyError: 'satellite_hwp_solve_amplitudes'

[1]error --------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode 1.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
[372ebbd04e02:03856] 1 more process has sent help message help-mpi-api.txt / mpi-abort

Configs I'm building with can be found on the koopman/docker-configs branch. I'm running with Python 3.11.8 in this image, as that's what's provided in the conda environment in the jupyter/minimal-notebook image this is based on.

Is this failing test case a concern?

tskisner commented 5 months ago

Thanks for the report, will try to investigate soon.

tskisner commented 5 months ago

When running soconda.sh in the container, did you install OS packages for mpich or openmpi? If so, you should set the MPICC environment variable to point to the system-installed mpicc compiler wrapper. If this environment variable is not set, then the conda package for openmpi will be installed (which might be what you want, not sure).

tskisner commented 5 months ago

For example, at NERSC, we set MPICC to point to the Cray compiler wrapper ("cc") so that mpi4py is using the correct MPI implementation.

BrianJKoopman commented 5 months ago

When running soconda.sh in the container, did you install OS packages for mpich or openmpi? If so, you should set the MPICC environment variable to point to the system-installed mpicc compiler wrapper. If this environment variable is not set, then the conda package for openmpi will be installed (which might be what you want, not sure).

Ah, no, that's probably the issue then. One suggestion I was going to make was a "dependencies" section in the soconda docs that includes the list of things like gcc and g++. mpich/openmpi would fit nicely there too. I'll try that out and get back to you, thanks for the suggestion!

tskisner commented 5 months ago

A critical feature of soconda is that it uses the conda compiler packages to build everything (not any system compilers). The only exception is the mpi4py package, which needs to build against whatever MPI implementation works with a given cluster (e.g. which has needed network drivers for interconnects, etc).

So for local (single node) installations the conda provided openmpi should be fine. But for multi-node systems there is probably a "recommended" MPI installation set up by the system administrators.

However, for this docker container I guess the conda packages for MPI should be fine. Unless you already installed some other MPI in the container?

Anyway, if the OS provided MPI libraries work, then we can just use those instead of the conda provided ones

BrianJKoopman commented 5 months ago

Ah, interesting. So I'm building on a "minimal image" provided by the Jupyter Project. It's ultimately based on the ubuntu:22.04 image, and comes with an already setup conda environment. However, that didn't seem to include gcc or g++, because things wouldn't compile without me installing those.

I'm not very familiar with conda, so I was hoping to leverage the existing conda environment and the install script here to have things "just work". Is it unusual that gcc/g++ weren't found in the existing conda environment? Perhaps it's something else I'm doing wrong with the conda environment.

BrianJKoopman commented 5 months ago

For example, at NERSC, we set MPICC to point to the Cray compiler wrapper ("cc") so that mpi4py is using the correct MPI implementation.

This script looks pretty helpful. I suspect actually that in my installation steps I'm not really activating the conda environment. I was trying to follow a suggestion from this blog post to do that, but maybe I'll just put everything in a script like you've linked here.

tskisner commented 5 months ago

Ah, another thing- how are you installing the conda base environment? All of my installs are based on a base/root environment that was installed with the helper script.

./tools/bootstrap_base.sh /path/to/base

and then running soconda.sh with -b /path/to/base

tskisner commented 5 months ago

Ah, nevermind I see above that you are using the default anaconda base that comes with the docker image. In that case there might be a challenge if that is not configured to use the conda-forge channel by default. Let me go see what is in that minimal image.

tskisner commented 5 months ago

Ok, looks like quay.io/jupyter/minimal-notebook is indeed using conda-forge for the default channel. In that case you could probably just run these 2 lines to install conda-build to the base environment before running soconda.sh.