i4Ds / Karabo-Pipeline

The Karabo Pipeline can be used as Digital Twin for SKA
https://i4ds.github.io/Karabo-Pipeline/
MIT License
11 stars 4 forks source link

512 sarus #526

Closed Lukas113 closed 8 months ago

Lukas113 commented 10 months ago

closes #449 closed #189

For testing-purpose, this PR needs it's dev-dependencies, referenced in environment.yaml and meta.yaml. However, BEFORE merging into the default brach, they need to be adapted and the according builds of the Karabo-Feedstock must be built at more or less the same time.

This PR was initially intended to be much smaller. However, because I wasn't able to test my changes with the old setup, I also had to define a dev & test-setup for Karabo to not interfere with the live conda-wheel and docker-images.

This PR also includes changes in the Krabo-Feedstock, where a lot of dependencies were adjusted. The reason was originally because we had some open-mpi builds, which needed to be adapted. However, during the time I was working on this PR, we decided to remove the mpi-builds with the PR #521 from Karabo. This may seem like the work I was doing there is not necessary. However, I also needed to adapt the build-string setup to enable best-practice conda-builds, which depends usually on the python-version, package-hash and build-nr. Only then, the conda installer will be able to select the dependency wheels properly and can decide whether to download & install a new wheel or take it from the cache. The same goes for dev-builds. In addition, I loosened well-known and stable dependency-constraints. This had a positive side-effect that we are no more constrained as we used to be regarding upgrading dependencies, which solves the issue #449 .

This PR has the following changes in this repository:

codecov[bot] commented 10 months ago

Codecov Report

Attention: 20 lines in your changes are missing coverage. Please review.

Comparison is base (e0517ae) 65.76% compared to head (745d018) 65.72%.

Files Patch % Lines
karabo/util/file_handler.py 70.58% 10 Missing :warning:
karabo/util/dask.py 42.85% 4 Missing :warning:
karabo/__init__.py 71.42% 2 Missing :warning:
karabo/imaging/image.py 0.00% 2 Missing :warning:
karabo/simulation/beam.py 0.00% 1 Missing :warning:
karabo/sourcedetection/evaluation.py 50.00% 1 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #526 +/- ## ========================================== - Coverage 65.76% 65.72% -0.04% ========================================== Files 47 48 +1 Lines 4504 4537 +33 ========================================== + Hits 2962 2982 +20 - Misses 1542 1555 +13 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

Lukas113 commented 10 months ago

There are still issues:

Lukas113 commented 10 months ago

Ok, regarding boost I found out what the issue is/was. The issue is, that boost, always need to be pinned at runtime to the version that was present at build time. E.g. casacore when I tried installing another boost version where I had an environment with an already installed casacore, I got a wheel which had a different build-nr and package-hash.

Fortunately, conda build provides a solution to this issue by using pin_run_as_build, where you do a build for each boost version, and pin it to each build.

Lukas113 commented 10 months ago

The according changes were made in the Karabo-Feedstock and the installation succeeded. Now, we have a bdsf build for each compatible booost-version >= 1.74 on anaconda.org with the same build-nr but a different package-hash. Also other stuff has been adapted in the feedstock to enable a decent dependency-resolving and hopefully prevent build & installation errors.

The only issue that still remains is the issue with the docker-image (see previous post). As soon as this is resolved, I'd be happy about a review so that it's ready to merge.

I'm not really sure about the merging process, since the builds on the feedstock needs to be triggered at the same time. Probably at release time? In addition, I'm not sure if it will break older karabo-releases since there are new builds of the feedstock with the same package-version and therefore a higher build-nr. I think they shouldn't break? However, unfortunately I can't really test that.

Lukas113 commented 10 months ago

There's currently an issue with the GitHub runners (see issue 3025) which prevents me from properly testing the workflows before they live in the default-branch.

What I mean that it's not impossible, however, testing has a side-effect of building and publishing a docker-container each time I try to test some github-workflows which I changed.

kenfus commented 9 months ago

For me this is pretty hard to review. Anything I should focus on? Overall, it seems OK but like I said, it's hard to test or review.

Lukas113 commented 9 months ago

@kenfus Just do a review wherever you feel you can provide some feedback. If you don't feel comfortable about reviewing on details, I suggest just focus on the design-choices.

Lukas113 commented 9 months ago

Please remember to not merge this PR because of the dev-dependencies in environment.yaml & meta.yaml

Lukas113 commented 9 months ago

Ok, merging should be fine now. I've fixed feedstock-dependencies of karabo and in the feedstock itself to a fixed build-number. This should ensure, that future updates of the same package-version with a higher build-number don't affect older installations. I've tested the installation locally, and everything seems to work fine:

image

I hope that I didn't break older conda-wheels because of the updates I did. However, I don't see a good way how to fix that in case they would be broken. However, with the new setup of fixed build-numbers, this shouldn't happen anymore in the future.

The current implementation of fixed build-numbers in the feedstock is currently a little bit messy, because they're hard-coded there. However, I think the feedstock needs a refactoring anyway. But until we get to this point, I think we can leave it as it is at the moment.

As mentioned, the current Dockerfiles don't allow a native mpi-hook by sarus, because the mpi-implementation doesn't live in a standard location. @deiruch mentioned, that mpich should be installable via yum or apt. Maybe this makes the installation of mpi more easy and could be done on the github-runners. However, I think this should be done in a separate PR since this PR is already too big which is totally my fault.

sfiruch commented 9 months ago

LGTM, apart from the mpi-hook/mpich-compilation things we talked about last week. (Use binary mpich release, instead of recompiling it)

Lukas113 commented 9 months ago

Ok, I'll have a look if it works locally and on the runners.

Lukas113 commented 9 months ago

@deiruch mpich is installable through apt. However, the only version available for ubuntu-22.04 is 4.0-2. So to ensure compatibility, I tried to fix mpich to 4.0 in the environment.yaml. However, this is not a resolvable environment, which got the error-mesage posted below. The issue is when h5py is fixed to any mpich-build, the environment is not resolvable. Thus, the only way I see to resolve this is to make a decision.

Either, don't allow a native mpi-hook, but instead get a h5py mpich-wheel OR downgrade mpich from 4.1 to 4.0 and dissalow an mpich-wheel of h5py

Since I don't know the consequences of either decision, I'm not sure which one is better or worse.

(karabo) root@9616a2df8d03:/repo# conda env update -f="environment.yaml"
Channels:
 - i4ds
 - nvidia/label/cuda-11.7.0
 - conda-forge
 - defaults
Platform: linux-64
Collecting package metadata (repodata.json): done
Solving environment: failed
Channels:
 - i4ds
 - nvidia/label/cuda-11.7.0
 - conda-forge
 - defaults
Platform: linux-64
Collecting package metadata (repodata.json): done
Solving environment: failed

LibMambaUnsatisfiableError: Encountered problems while solving:
  - package h5py-3.4.0-mpi_mpich_py310h83ac6b3_2 requires python >=3.10,<3.11.0a0, but none of the providers can be installed

Could not solve for environment specs
The following packages are incompatible
├─ bluebild 0.1.0 *_0 is installable with the potential options
│  ├─ bluebild 0.1.0 would require
│  │  └─ python >=3.9,<3.10.0a0  with the potential options
│  │     ├─ python [3.9.0|3.9.1|...|3.9.7], which can be installed;
│  │     ├─ python [3.9.0|3.9.1|...|3.9.9] would require
│  │     │  └─ python_abi 3.9.* *_cp39, which can be installed;
│  │     └─ python [3.9.10|3.9.12|3.9.16|3.9.17|3.9.18] would require
│  │        └─ python_abi 3.9 *_pypy39_pp73, which can be installed;
│  └─ bluebild 0.1.0 would require
│     ├─ python >=3.10,<3.11.0a0 , which conflicts with any installable versions previously reported;
│     └─ python_abi 3.10.* *_cp310, which conflicts with any installable versions previously reported;
├─ h5py * mpi_mpich* is installable with the potential options
│  ├─ h5py [2.10.0|2.9.0|...|3.6.0], which can be installed;
│  ├─ h5py [2.10.0|3.1.0] would require
│  │  └─ python >=3.6,<3.7.0a0 , which conflicts with any installable versions previously reported;
│  ├─ h5py [2.10.0|3.1.0|...|3.6.0] would require
│  │  └─ python >=3.7,<3.8.0a0 , which conflicts with any installable versions previously reported;
│  ├─ h5py [2.10.0|3.1.0|...|3.6.0] would require
│  │  └─ python >=3.8,<3.9.0a0 , which conflicts with any installable versions previously reported;
│  ├─ h5py [3.4.0|3.6.0] would require
│  │  └─ python >=3.10,<3.11.0a0 , which conflicts with any installable versions previously reported;
│  ├─ h5py [3.7.0|3.8.0] would require
│  │  ├─ python >=3.10,<3.11.0a0 , which conflicts with any installable versions previously reported;
│  │  └─ python_abi 3.10.* *_cp310, which conflicts with any installable versions previously reported;
│  ├─ h5py [3.7.0|3.8.0] would require
│  │  ├─ python >=3.11,<3.12.0a0 , which conflicts with any installable versions previously reported;
│  │  └─ python_abi 3.11.* *_cp311, which conflicts with any installable versions previously reported;
│  ├─ h5py 3.7.0 would require
│  │  ├─ python >=3.7,<3.8.0a0 , which conflicts with any installable versions previously reported;
│  │  └─ python_abi 3.7.* *_cp37m, which conflicts with any installable versions previously reported;
│  ├─ h5py [3.7.0|3.8.0] would require
│  │  ├─ python >=3.8,<3.9.0a0 , which conflicts with any installable versions previously reported;
│  │  └─ python_abi 3.8.* *_cp38, which conflicts with any installable versions previously reported;
│  ├─ h5py [3.7.0|3.8.0] would require
│  │  └─ hdf5 >=1.12.2,<1.12.3.0a0 mpi_mpich_* with the potential options
│  │     ├─ hdf5 1.12.2 would require
│  │     │  └─ openssl >=1.1.1q,<1.1.2a , which can be installed;
│  │     ├─ hdf5 1.12.2, which can be installed;
│  │     └─ hdf5 [1.12.2|1.14.0] would require
│  │        └─ openssl >=1.1.1s,<1.1.2a , which can be installed;
│  ├─ h5py 3.7.0 would require
│  │  └─ hdf5 >=1.12.1,<1.12.2.0a0 mpi_mpich_* with the potential options
│  │     ├─ hdf5 1.12.1 would require
│  │     │  └─ openssl >=1.1.1l,<1.1.2a , which can be installed;
│  │     ├─ hdf5 1.12.1, which can be installed;
│  │     └─ hdf5 1.12.1 would require
│  │        └─ openssl >=1.1.1k,<1.1.2a , which can be installed;
│  ├─ h5py [3.10.0|3.9.0] would require
│  │  └─ mpich >=4.1.2,<5.0a0 , which can be installed;
│  ├─ h5py [3.10.0|3.9.0] would require
│  │  └─ python_abi 3.12.* *_cp312, which conflicts with any installable versions previously reported;
│  ├─ h5py 3.8.0 would require
│  │  └─ hdf5 >=1.14.0,<1.14.1.0a0 mpi_mpich_* with the potential options
│  │     ├─ hdf5 [1.12.2|1.14.0], which can be installed (as previously explained);
│  │     ├─ hdf5 1.14.0 would require
│  │     │  └─ openssl >=1.1.1t,<1.1.2a , which can be installed;
│  │     └─ hdf5 1.14.0, which can be installed;
│  ├─ h5py 3.9.0 would require
│  │  └─ mpich >=4.1.1,<5.0a0 , which can be installed;
│  └─ h5py 3.9.0 would require
│     └─ hdf5 >=1.14.1,<1.14.2.0a0 mpi_mpich_*, which can be installed;
├─ montagepy 6.0.0 *_0 is uninstallable because there are no viable options
│  ├─ montagepy 6.0.0 would require
│  │  ├─ python >=3.10,<3.11.0a0 , which conflicts with any installable versions previously reported;
│  │  └─ python_abi 3.10.* *_cp310, which conflicts with any installable versions previously reported;
│  └─ montagepy 6.0.0 would require
│     ├─ python >=3.9,<3.10.0a0  with the potential options
│     │  ├─ python [3.9.0|3.9.1|...|3.9.7], which can be installed;
│     │  ├─ python [3.9.0|3.9.1|...|3.9.9], which cannot be installed (as previously explained);
│     │  └─ python [3.9.10|3.9.12|3.9.16|3.9.17|3.9.18], which can be installed (as previously explained);
│     └─ python_abi 3.9.* *_cp39 but there are no viable options
│        ├─ python_abi 3.9 conflicts with any installable versions previously reported;
│        └─ python_abi 3.9 would require
│           └─ python 3.9.* *_cpython, which conflicts with any installable versions previously reported;
├─ mpich 4.0**  is uninstallable because it conflicts with any installable versions previously reported;
├─ oskarpy 2.8.3 *_0 is uninstallable because there are no viable options
│  ├─ oskarpy 2.8.3 would require
│  │  ├─ oskar 2.8.3 *_0, which requires
│  │  │  └─ hdf5 >=1.14.2,<1.14.3.0a0  but there are no viable options
│  │  │     ├─ hdf5 1.14.2 conflicts with any installable versions previously reported;
│  │  │     └─ hdf5 1.14.2 would require
│  │  │        └─ openssl >=3.1.2,<4.0a0 , which conflicts with any installable versions previously reported;
│  │  └─ python_abi 3.10.* *_cp310, which conflicts with any installable versions previously reported;
│  └─ oskarpy 2.8.3 would require
│     ├─ oskar 2.8.3 *_0, which cannot be installed (as previously explained);
│     └─ python >=3.9,<3.10.0a0  with the potential options
│        ├─ python [3.9.0|3.9.1|...|3.9.7], which can be installed;
│        ├─ python [3.9.0|3.9.1|...|3.9.9], which cannot be installed (as previously explained);
│        └─ python [3.9.10|3.9.12|3.9.16|3.9.17|3.9.18], which can be installed (as previously explained);
├─ python 3.9**  is installable with the potential options
│  ├─ python [3.9.0|3.9.1|...|3.9.7], which can be installed;
│  ├─ python [3.9.0|3.9.1|...|3.9.9], which cannot be installed (as previously explained);
│  └─ python [3.9.10|3.9.12|3.9.16|3.9.17|3.9.18], which can be installed (as previously explained);
├─ ska-gridder-nifty-cuda 0.3.0 *_0 is uninstallable because there are no viable options
│  ├─ ska-gridder-nifty-cuda 0.3.0 would require
│  │  └─ python_abi 3.9.* *_cp39 but there are no viable options
│  │     ├─ python_abi 3.9 conflicts with any installable versions previously reported;
│  │     └─ python_abi 3.9, which cannot be installed (as previously explained);
│  └─ ska-gridder-nifty-cuda 0.3.0 would require
│     └─ python >=3.10,<3.11.0a0 , which conflicts with any installable versions previously reported;
├─ ska-sdp-datamodels 0.1.3 *_0 is installable and it requires
│  └─ h5py >=3.7.0  with the potential options
│     ├─ h5py [3.7.0|3.8.0], which cannot be installed (as previously explained);
│     ├─ h5py [3.7.0|3.8.0], which cannot be installed (as previously explained);
│     ├─ h5py 3.7.0, which cannot be installed (as previously explained);
│     ├─ h5py [3.7.0|3.8.0], which cannot be installed (as previously explained);
│     ├─ h5py [3.7.0|3.8.0], which can be installed (as previously explained);
│     ├─ h5py 3.7.0, which can be installed (as previously explained);
│     ├─ h5py [3.10.0|3.9.0], which can be installed (as previously explained);
│     ├─ h5py [3.10.0|3.9.0], which cannot be installed (as previously explained);
│     ├─ h5py 3.8.0, which can be installed (as previously explained);
│     ├─ h5py 3.9.0, which can be installed (as previously explained);
│     ├─ h5py 3.9.0, which can be installed (as previously explained);
│     └─ h5py [3.10.0|3.7.0|3.8.0|3.9.0] conflicts with any installable versions previously reported;
└─ tools21cm 2.0.2 *_0 is installable and it requires
   └─ python 3.9.*  with the potential options
      ├─ python [3.9.0|3.9.1|...|3.9.7], which can be installed;
      ├─ python [3.9.0|3.9.1|...|3.9.9], which cannot be installed (as previously explained);
      └─ python [3.9.10|3.9.12|3.9.16|3.9.17|3.9.18], which can be installed (as previously explained).
sfiruch commented 9 months ago

For now we don't need h5py with mpi. It's only useful to allow parallel writing of HDF5 files, which we don't use AFAIK (see https://docs.h5py.org/en/stable/mpi.html).

Lukas113 commented 9 months ago

I've discovered a bug which I have to fix. Currently, for docker-containers everything works as intended. However, on Singularity and Sarus containers, the conda-setup doesn't work yet. This means that there, the wrong interpreter is selected per default. Also switching venvs is not straight-forward because conda activate doesn't work out of the box.

For some reason, the CMD and ENTRYPOINT doesn't seem to work for non-docker containers (maybe just the way I used them, or entirely? not sure though). However, I need them because otherwise the .bashrc won't get executed, which is essential to switch a conda venv. Note: we need to have a venv-setup to not accidentally damage the base-environment.

Currently, I'm not sure how to solve this issue. Therefore, it's work in progress.

Lukas113 commented 9 months ago

There's also a bug with the mpich-hook. The installation in the container (conda dep resolving & apt-installation) and the conda-link to standard-location worked. However, the hook didn't. This is also work in progress.

Lukas113 commented 9 months ago

The issue of not being able to switch venvs in singularity containers is not straight-forward. The issue lies how docker and singularity handle user-ids and what directories and files get mounted by singularity, which makes the initialization of conda very difficult. The issue regarding user-ids is, that one enters the a singularity container with the same user-id as the user who launched the process, while the files & directories of docker are created from root. A detailed description of the issue can be seen here. Currently I'm exploring solutions to this issue. I want to avoid creating a Singularity file, because that would imply that one has to get this file before creating a singularity container which is not optimal.

Lukas113 commented 9 months ago

Ok, the singularity issue seems to be fixed. The following docker-commands did the job:

# set bash-env accordingly for interactive and non-interactive shells for docker & singularity
RUN mkdir opt/etc && \
    echo "conda activate karabo" >> ~/.bashrc && \
    cat ~/.bashrc | sed -n '/conda initialize/,/conda activate/p' > /opt/etc/conda_init_script
ENV BASH_ENV=/opt/etc/conda_init_script
RUN echo "source $BASH_ENV" >> /etc/bash.bashrc && \
    echo "source $BASH_ENV" >> /etc/profile

It configures the conda init & activate process for each use-case, ineractive and non-interacive bash-shell for docker & singularity.

Lukas113 commented 9 months ago

Currently not working are Singularity containers due to permission issues & native mpi-hook of Sarus containers. As discussed in the meeting at 21-12-23, these will be fixed in another PR.

Lukas113 commented 8 months ago

I've fixed the mpi-issue and have tested a simple example on cscs srun -N2 -n2 -C gpu -A sk05 sarus run --mpi ghcr.io/i4ds/karabo-pipeline:0.21.dev1 python -c 'print("ABCDEFG")' with and without the mpi-hook and it worked (output was 2x ABCDEFG). The only thing left to do is that karabo also works for singularity containers, which is work in progress.

Also all mpi-tests passed with the following command: srun -N2 -n2 -C gpu -A sk05 sarus run --mpi ghcr.io/i4ds/karabo-pipeline:0.21.dev1 pytest --only-mpi /opt/conda/envs/karabo/lib/python3.9/site-packages/karabo/test