Closed evanfloden closed 6 months ago
# mambaorg/micromamba:1.5.8-bookworm-slim
FROM mambaorg/micromamba@sha256:abcb3ae7e3521d08e1fdeaff63131765b34e4f29b6a8a2c28660036b53841569
# python 3.11.8-slim-bullseye intes64
FROM python@sha256:a2d01031695ff170831430810ee30dd06d8413b08f72ad978b43fd10daa6b86e
LABEL maintainer="Alessio Vignoli" \
name="alessiovignoli3/stimulus:latest" \
description="Docker image containing python packages required for stimulus using modules"
RUN micromamba install -y -n base -c defaults -c bioconda -c conda-forge \
python=3.11.8 \
typing_extensions=4.11.0 \
importlib_metadata=7.1.0 \
numpy=1.26 \
pytorch-lightning=2.0.1 \
polars=0.20.19 \
scikit-learn=1.3.0 \
ray-tune=2.12.0 \
ray-train=2.12.0 \
procps-ng=4.0.4 \
matplotlib==3.8.2 \
&& micromamba clean -a -y
ENV PATH="$MAMBA_ROOT_PREFIX/bin:$PATH"
USER root
using this image i only managed to run it once on the CRG cluster. the other times it always throws this error:
N E X T F L O W ~ version 23.10.1
Launching `main.nf` [deadly_ptolemy] DSL2 - revision: 32ea66cbec
executor > crg (1)
executor > crg (1)
[7a/be75a4] process > CHECK_MODEL:CHECK_TORCH_MODEL (titanic_stimulus.json-titanic_stimulus.csv) [100%] 1 of 1, failed: 1 ✘
[- ] process > HANDLE_DATA:INTERPRET_JSON -
[- ] process > HANDLE_DATA:SPLIT_CSV:STIMULUS_SPLIT_CSV -
[- ] process > HANDLE_DATA:TRANSFORM_CSV:STIMULUS_TRANSFORM_CSV -
[- ] process > HANDLE_DATA:SHUFFLE_CSV:STIMULUS_SHUFFLE_CSV -
[- ] process > HANDLE_TUNE:TORCH_TUNE -
Execution cancelled -- Finishing pending tasks before exit
Done
ERROR ~ Error executing process > 'CHECK_MODEL:CHECK_TORCH_MODEL (titanic_stimulus.json-titanic_stimulus.csv)'
Caused by:
Process `CHECK_MODEL:CHECK_TORCH_MODEL (titanic_stimulus.json-titanic_stimulus.csv)` terminated with an error exit status (132)
Command executed:
launch_check_model.py -d titanic_stimulus.csv -m titanic_model.py -e titanic_stimulus.json -c titanic_model.yaml
Command exit status:
132
Command output:
(empty)
Command error:
.command.sh: line 2: 15 Illegal instruction (core dumped) launch_check_model.py -d titanic_stimulus.csv -m titanic_model.py -e titanic_stimulus.json -c titanic_model.yaml
Work dir:
/nfs/users/cn/avignoli/stimulus/work/7a/be75a4998174e0b2e5a9c57df4e8cb
Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh`
-- Check '.nextflow.log' file for details
I guess it has to do with how recent the cpu are on the cluster. Same image (that seems fine in terms of packages and dependecies) sometimes runijnng sometimes not.
The image present in #150 throws the following error on the sequera cloud platform here. It seems ps is not an issue anymore but the memory of the node is.
Solved in pr #150
By having the
ps
package in each container allows the gathering of task metrics.