mlcommons / algorithmic-efficiency

MLCommons Algorithmic Efficiency is a benchmark and competition measuring neural network training speedups due to algorithmic improvements in both training algorithms and models.
https://mlcommons.org/en/groups/research-algorithms/
Apache License 2.0
330 stars 68 forks source link

Building Singularity image crashes with "not an empty directory" #552

Closed andres-fr closed 1 year ago

andres-fr commented 1 year ago

Description/Reproduction:

When following instructions given here to create Singularity/Apptainer image:

pip3 install spython
cd algorithmic-efficiency/docker
spython recipe Dockerfile &> Singularity.def
singularity build --fakeroot mlcommons.sif Singularity.def

The last command (i.e. building the Singularity image) crashes with the following error:

Setting up algorithmic_efficiency repo
+ branch=main
+ framework=both
+ git_url=https://github.com/mlcommons/algorithmic-efficiency.git
+ git clone https://github.com/mlcommons/algorithmic-efficiency.git
fatal: destination path 'algorithmic-efficiency' already exists and is not an empty directory.
+ cd /algorithmic-efficiency
+ git checkout main

Explanation:

Observe that the automatically generated .def includes the following line:

%files
scripts/startup.sh /algorithmic-efficiency/docker/scripts/startup.sh

When executed, this line copies the file from the first path (on host machine) into the second path (inside container). And for this, it recreates the full path, thus creating an algorithmic-efficiency directory.

Then, further down the line, git will crash when attempting to clone into that directory, since it already exists.

Fix:

Note how, if we were to git clone successfully, the startup.sh file would end up in the exact same place anyway. Further note that none of the in-between commands depends on that file in any way. Therefore, this issue should be fixed by either of the following:

To confirm this, manually suppressing the %files command indeed allowed compilation to continue. To pursue the cleanest fix, I created a Python script that mimics the spython command, with the only difference that it does not produce a %files comamnd. Using the created file allowed to continue the installation process without issues, so I'll propose a PR with the fix.

I'll copypaste the file in a commend for further reference.

andres-fr commented 1 year ago

Alright, so PR https://github.com/mlcommons/algorithmic-efficiency/pull/553 is the proposed fix.

For reference, the singularity_converter.py script (present in the PR) is:

"""
This script is a modification of the
``spython recipe Dockerfile &> Singularity.def`` command, implemented here:
github.com/singularityhub/singularity-cli/blob/master/spython/client/recipe.py
It converts the Docker recipy to Singularity, but suppressing any %files
command. Usage example:
python singularity_converter.py -i Dockerfile -o Singularity.def
"""

import argparse
#
import spython
from spython.main.parse.parsers import get_parser
from spython.main.parse.writers import get_writer

# globals
ENTRY_POINT = "/bin/bash"  # seems to be a good default
FORCE = False  # seems to be a good default
#
parser = argparse.ArgumentParser(description="Custom Singularity converter")
parser.add_argument('-i', '--input', type=str,
                    help="Docker input path", default="Dockerfile")
parser.add_argument('-o', '--output', type=str,
                    help="Singularity output path", default="Singularity.def")
args = parser.parse_args()
INPUT_DOCKERFILE_PATH = args.input
OUTPUT_SINGULARITY_PATH = args.output

# create Docker parser and Singularity writer
parser = get_parser("docker")
writer = get_writer("singularity")

# parse Dockerfile into Singularity and suppress %files commands
recipeParser = parser(INPUT_DOCKERFILE_PATH)
recipeWriter = writer(recipeParser.recipe)
key, = recipeParser.recipe.keys()
recipeWriter.recipe[key].files = []

# convert to string and save to output file
result = recipeWriter.convert(runscript=ENTRY_POINT, force=FORCE)
with open(OUTPUT_SINGULARITY_PATH, "w") as f:
    f.write(result)