Open qwofford opened 3 years ago
@trandles-lanl Singularity support is still part of the requirements right?
Yes. It's in use still at other sites we care about. Interestingly, it seems that some sites are starting to build with podman and run with Singularity. I'm not exactly suggesting we add podman support at this point, but it's something for a future enhancement.
I don't believe I have access to a cluster where Singularity is supported. Singularity used to be supported on Summit, but it is not supported any longer. Is there a system that would be a good test-case for Singularity?
Making the container driver for Singularity will be a longer term effort I think, but I'm happy to chip away at it.
Another question that came up during the meeting today:
Is Singularity still the Charliecloud-alternative container runtime that makes the most sense to support?
Yes. It's in use still at other sites we care about. Interestingly, it seems that some sites are starting to build with podman and run with Singularity. I'm not exactly suggesting we add podman support at this point, but it's something for a future enhancement.
So do we need to add to our build functionality Singularity or Podman?
I could be mistaken, but I believe the comparison would be Singularity vs an OCI-compliant runtime, like one that Podman can use...but I'm not sure how we support an OCI compliant runtime if it doesn't interface with a container image manager...so ultimately Podman may be something we have to support.
I think it makes sense to place BEE in the context of tools that serve these functions:
Here's what I think BEE is, placed in the context of similar tools, and we can discuss:
.*-compose
products aim to achieve this: docker-compose
, singularity-compose
, podman-compose
, etc. These products allow users to describe a set of containers in a single configuration file. This configuration file will launch/halt the set of containers all at once. .*-compose
container orchestrator as a workflow orchestrator that only supports 1-step workflows: "start these N containers or fail"..*-compose
container orchestrators because .*-compose
tools expect multiple containers to launch on the same node or virtual nodes, and they define software networking to facilitate communication between those containers. With .*-compose
tools, the network and systems where they run are abstracted away. Further, distributed parallel applications must not rely on any software abstractions that increase communication overhead between node. System schedulers and communication libraries have historically provided the high-performance interface developers require for distributed parallel apps. Software networking spoils all the assumptions made by message passing communication libraries, and since all .*-compose
tools rely on software networking, another kind of container orchestration mechanism is required or a new kind of communication library which supports high performance software networking is required. BEE targets the former by passing the burden to the container runtime+scheduler+communication library (a solved problem).So far, placed in context, I believe we can say:
.*-compose
tools are container orchestrators but they are not compatible with existing distributed parallel applications.Is this a paper?
Getting back to the point, which container tool do we extend support to after Charliecloud? Here are some related considerations:
dockerd
might be tricky. How do you docker pull
without dockerd? We could ignore the fact that Docker consists of two distinct tools, which has been the implicit plan in my mind.Singularity is OCI compliant, or at least it has an OCI compliant mode...so maybe supporting Charliecloud and Podman will cover all the bases...getting Podman to work will be a trick, I bet!
Singularity is still the supported runtime at LLNL. I did see that it's no longer supported on Summit. I think podman might be a better target, but it would need to be the rootless podman configuration. Red Hat is doing more work on enhancing that capability so it might be premature. Podman is supposed to be a command-for-command drop in replacement for docker so if you have the proper stuff for docker build
it should just work for podman.
If so, does the build PR break it?
The answer to this is yes the build PR which has been incorporated into develop does break the original Singularity capability, that was to at least run with a container. I am going to add an issue to fix at least that capability and assign myself.
If so, does the build PR break it?