Closed luator closed 6 months ago
Yes, our latex dependency is a problem. I would actually separate the latex report generation from the cluster scripts and have it as a separate script to run on the logged data. Then people can also run this on their computer and install latex packages as needed.
We basically have that already when you set generate_report = "never"
in the config. Then no report is generated automatically and the user can call python3 -m cluster_utils.scripts.generate_report
to generate the report manually on demand.
I still find it more convenient to at least have the option to auto-generate it, though. Actually I just had an idea how we could keep it working relatively easily: We can provide a container with cluster_utils and all dependencies installed. People who want to use it can then simply download that container instead of installing cluster_utils via pip.
I also think providing an (automatically built) container on Github is a good idea.
Actually, I think cluster_utils could be two fully separate packages: the server application that the user interacts with, and the client integrated into the user's code. I don't think we need to actually separate them, but it would be a good a idea to provide a minimal dependency set for only the client. The user's project would then only depend on that part, and this would still be installed via pip.
I realised that using a container to run the cluster_utils main process doesn't work so easily as it can't submit jobs from inside the container. At least unless I install the Slurm stuff in the container as well, which doesn't seem practical.
In this case, there is another rather simple solution, though: I build a container with only pdflatex (see below), named the file pdflatex
and put it in the PATH. So basically this works as a standalone executable to run pdflatex.
bootstrap: docker
from: ubuntu:22.04
%post
set -e
export DEBIAN_FRONTEND=noninteractive
echo "deb http://us.archive.ubuntu.com/ubuntu focal universe" >> /etc/apt/sources.list
apt-get update
apt-get install -y texlive-latex-base texlive-latex-extra
# cleanup to reduce container size
apt-get clean
%runscript
pdflatex "$@"
I'll put that somewhere in the documentation and then I think this issue can be closed.
That sounds great for solving the latex issue, although I think the question of how to provide a standalone cluster_utils container is still relevant. Is it possible to mount the host binaries into the container and link them?
Edit: as per this comment, it might be possible: https://groups.google.com/a/lbl.gov/g/singularity/c/syLcsIWWzdo/m/dWCiUyCPAQAJ
Hm, not sure. On ml cloud the Slurm binaries are located in /usr/bin
. Binding that into the container sounds like a bad idea.
In that case, binding sbatch
should be sufficient, I guess.
I think you can only bind directories, not individual files, so not sure if that would easily work. Do we actually have a use case where a container for running the cluster_utils main process is really needed? If yes, we should track it in a dedicated issue.
I just checked, you can indeed bind files. The use case is that everyone can pull the latest container (automatically built) from Github, and does not need to install cluster utils anymore. Seems convenient to me.
Generating the report at the end of a grid search fails on Galvani with this error:
I assume the corresponding package is simply not installed there.
Not sure what's the best solution to this. One option could be to provide a container which contains all packages needed for the report. This would make us independent on what is provided on the cluster but the user would somehow need to download that container, which could be a bit annoying. Maybe it could happen automatically during the installation of cluster_utils.