lanl / BEE

Other
14 stars 3 forks source link

build: Does Singularity still need to be supported? #368

Open qwofford opened 3 years ago

qwofford commented 3 years ago

If so, does the build PR break it?

rstyd commented 3 years ago

@trandles-lanl Singularity support is still part of the requirements right?

trandles-lanl commented 3 years ago

Yes. It's in use still at other sites we care about. Interestingly, it seems that some sites are starting to build with podman and run with Singularity. I'm not exactly suggesting we add podman support at this point, but it's something for a future enhancement.

qwofford commented 3 years ago

I don't believe I have access to a cluster where Singularity is supported. Singularity used to be supported on Summit, but it is not supported any longer. Is there a system that would be a good test-case for Singularity?

Making the container driver for Singularity will be a longer term effort I think, but I'm happy to chip away at it.

qwofford commented 3 years ago

Another question that came up during the meeting today:

Is Singularity still the Charliecloud-alternative container runtime that makes the most sense to support?

pagrubel commented 3 years ago

Yes. It's in use still at other sites we care about. Interestingly, it seems that some sites are starting to build with podman and run with Singularity. I'm not exactly suggesting we add podman support at this point, but it's something for a future enhancement.

So do we need to add to our build functionality Singularity or Podman?

qwofford commented 3 years ago

I could be mistaken, but I believe the comparison would be Singularity vs an OCI-compliant runtime, like one that Podman can use...but I'm not sure how we support an OCI compliant runtime if it doesn't interface with a container image manager...so ultimately Podman may be something we have to support.

I think it makes sense to place BEE in the context of tools that serve these functions:

Here's what I think BEE is, placed in the context of similar tools, and we can discuss:

  1. "Docker" includes the Docker Engine container runtime, but Docker is more than just a container runtime. Docker also includes container image management and sharing tools (moderated by dockerd).
  2. Podman offers container image management to any OCI-compliant container runtime. Instead of using dockerd to manage services, it interfaces with systemd to manage services. Podman is designed to be a drop-in replacement for the set of tools known as "Docker". Like Docker, Podman assumes that it is important to support persistent system services with persistent system services (systemd), and we regard that assumption with suspicion.
  3. Charliecloud and Singularity are container runtimes. They are also container image management tools but they do not rely on persistent system service managers like systemd or dockerd. This is because Charliecloud/Singularity are primarily tools designed to run distributed applications on shared computing resources, where persistent services are not allowed.
  4. Container orchestrators are distinct from container runtimes and container image managers. The .*-compose products aim to achieve this: docker-compose, singularity-compose, podman-compose, etc. These products allow users to describe a set of containers in a single configuration file. This configuration file will launch/halt the set of containers all at once.
  5. Workflow orchestrators are distinct from container orchestrators, and perhaps BEE is unique in its support for distributed parallel application workflow orchestration? We represent user workflows in a graph database, and execute each step of that graph database. These workflows may be complex. You might view a .*-compose container orchestrator as a workflow orchestrator that only supports 1-step workflows: "start these N containers or fail".
  6. BEE is not a container orchestrator or a container orchestrator interface. BEE includes a container runtime interface, and a containerized scheduler+communication library is the container orchestration tool.
  7. From BEE's perspective, it makes little sense to take advantage of .*-compose container orchestrators because .*-compose tools expect multiple containers to launch on the same node or virtual nodes, and they define software networking to facilitate communication between those containers. With .*-compose tools, the network and systems where they run are abstracted away. Further, distributed parallel applications must not rely on any software abstractions that increase communication overhead between node. System schedulers and communication libraries have historically provided the high-performance interface developers require for distributed parallel apps. Software networking spoils all the assumptions made by message passing communication libraries, and since all .*-compose tools rely on software networking, another kind of container orchestration mechanism is required or a new kind of communication library which supports high performance software networking is required. BEE targets the former by passing the burden to the container runtime+scheduler+communication library (a solved problem).
  8. BEE is also a container image management interface (the build tool could be thought of this way). We only support Charliecloud right now, which includes container image management tools. Whether we support tools like Podman and Docker is an open question. The Singularity/Charliecloud comparison is more 1:1, since Singularity also supports image management without persistent daemon interactions.
  9. BEE is also a container runtime interface. This is evident by the need to support multiple container runtimes. Not much to say here.

So far, placed in context, I believe we can say:

Is this a paper?

Getting back to the point, which container tool do we extend support to after Charliecloud? Here are some related considerations:

qwofford commented 3 years ago

Singularity is OCI compliant, or at least it has an OCI compliant mode...so maybe supporting Charliecloud and Podman will cover all the bases...getting Podman to work will be a trick, I bet!

trandles-lanl commented 2 years ago

Singularity is still the supported runtime at LLNL. I did see that it's no longer supported on Summit. I think podman might be a better target, but it would need to be the rootless podman configuration. Red Hat is doing more work on enhancing that capability so it might be premature. Podman is supposed to be a command-for-command drop in replacement for docker so if you have the proper stuff for docker build it should just work for podman.

pagrubel commented 2 years ago

If so, does the build PR break it?

The answer to this is yes the build PR which has been incorporated into develop does break the original Singularity capability, that was to at least run with a container. I am going to add an issue to fix at least that capability and assign myself.