Using GPU resources for Unmisct in MCMICRO

The bioinformatic department and myself are working on setting up a MCMICRO pipeline using a remote linux server. We eventually got both tutorial images (exemplar-001 and exemplar-002) to finish the pipeline in less than the expect amount of time. However, we want to try and run the pipeline using the server's GPU for the unmicst step since we suspect faster results.

We tried creating a custom.config file (as per: https://github.com/labsyspharm/mcmicro/issues/354) and passing the following command line:

nextflow run labsyspharm/mcmicro --in exemplar-001 -c custom.config -profile singularity

Although the run was successful, the command log showed the GPU was never found and, again, resorted to using the CPU + RAM. We've also tried manually adding the --nv argument in `singularity.config' but it seems it ignores it. We've also ensured that nvidia.smi can be accessed from the container.

The server is using a Tesla M40 24GB running Cuda 11.0

Here is the command.log

WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/tensorflow/python/compat/v2_compat.py:111: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
Instructions for updating:
non-resource variables are not supported in the long term
sh: 1: nvidia-smi: not found
2024-04-15 18:06:59.132729: E tensorflow/stream_executor/cuda/cuda_driver.cc:271] failed call to cuInit: UNKNOWN ERROR (34)
/app/UnMicst1-5.py:114: UserWarning: `tf.layers.batch_normalization` is deprecated and will be removed in a future version. Please use `tf.keras.layers.BatchNormalization` instead. In particular, `tf.control_dependencies(tf.GraphKeys.UPDATE_OPS)` should not be used (consult the `tf.keras.layers.BatchNormalization` documentation).
  bn = tf.nn.leaky_relu(tf.layers.batch_normalization(c00+shortcut, training=UNet2D.tfTraining))
/usr/local/lib/python3.8/dist-packages/keras/legacy_tf_layers/normalization.py:455: UserWarning: `layer.apply` is deprecated and will be removed in a future version. Please use `layer.__call__` method instead.
  return layer.apply(inputs, training=training)
/app/UnMicst1-5.py:136: UserWarning: `tf.layers.batch_normalization` is deprecated and will be removed in a future version. Please use `tf.keras.layers.BatchNormalization` instead. In particular, `tf.control_dependencies(tf.GraphKeys.UPDATE_OPS)` should not be used (consult the `tf.keras.layers.BatchNormalization` documentation).
  lbn = tf.nn.leaky_relu(tf.layers.batch_normalization(
/app/UnMicst1-5.py:139: UserWarning: `tf.layers.dropout` is deprecated and will be removed in a future version. Please use `tf.keras.layers.Dropout` instead.
  return tf.layers.dropout(lbn, 0.35, training=UNet2D.tfTraining)
/usr/local/lib/python3.8/dist-packages/keras/legacy_tf_layers/core.py:401: UserWarning: `layer.apply` is deprecated and will be removed in a future version. Please use `layer.__call__` method instead.
  return layer.apply(inputs, training=training)
/app/UnMicst1-5.py:199: UserWarning: `tf.layers.batch_normalization` is deprecated and will be removed in a future version. Please use `tf.keras.layers.BatchNormalization` instead. In particular, `tf.control_dependencies(tf.GraphKeys.UPDATE_OPS)` should not be used (consult the `tf.keras.layers.BatchNormalization` documentation).
  tf.layers.batch_normalization(tf.nn.conv2d(cc, luXWeights2, strides=[1, 1, 1, 1], padding='SAME'),
/app/UnMicst1-5.py:220: UserWarning: `tf.layers.batch_normalization` is deprecated and will be removed in a future version. Please use `tf.keras.layers.BatchNormalization` instead. In particular, `tf.control_dependencies(tf.GraphKeys.UPDATE_OPS)` should not be used (consult the `tf.keras.layers.BatchNormalization` documentation).
  return tf.layers.batch_normalization(
Using CPU
loading data
loading data
loading data
0.34
0.25
Model restored.
Using channel 1
Inference...
Inference...
Inference...

Hi @SalimSoria,

Can you please share your custom.config?

Also, can you try nextflow run labsyspharm/mcmicro --in exemplar-001 -c custom.config -profile singularity,GPU (Note no spaces around the comma.)

And one more question: is it just UnMicst or none of the segmentation containers are using the GPU? You can try running them all in parallel with the following params.yml:

workflow:
  segmentation: [unmicst, mesmer, cellpose]

The thing I would check is whether .command.run files in the corresponding work/ directories contain the expected singularity commands (usually with grep singularity work/*/*/.command.run).

Here is the custom.config file:

Docker.runoptions = '--cpus 0.000 --gpus all'
Singularity.runOptions = '—C –-nv'

We also tried running the command with -profile singularity,GPU without the custom.config file. This actually caused the GPU to be detected, but now we've run into another error. See below for the .command.log. The server does have cudnn installed, but may not be viewed within the container.

WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/tensorflow/python/compat/v2_compat.py:111: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
Instructions for updating:
non-resource variables are not supported in the long term
Tue Apr 16 00:03:18 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  Tesla M40 24GB                 Off |   00000000:82:00.0 Off |                    0 |
| N/A   44C    P0             59W /  250W |       0MiB /  23040MiB |     95%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+
2024-04-16 00:03:35.142294: E tensorflow/stream_executor/cuda/cuda_dnn.cc:371] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2024-04-16 00:03:35.143246: E tensorflow/stream_executor/cuda/cuda_dnn.cc:371] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
/app/UnMicst1-5.py:114: UserWarning: `tf.layers.batch_normalization` is deprecated and will be removed in a future version. Please use `tf.keras.layers.BatchNormalization` instead. In particular, `tf.control_dependencies(tf.GraphKeys.UPDATE_OPS)` should not be used (consult the `tf.keras.layers.BatchNormalization` documentation).
  bn = tf.nn.leaky_relu(tf.layers.batch_normalization(c00+shortcut, training=UNet2D.tfTraining))
/usr/local/lib/python3.8/dist-packages/keras/legacy_tf_layers/normalization.py:455: UserWarning: `layer.apply` is deprecated and will be removed in a future version. Please use `layer.__call__` method instead.
  return layer.apply(inputs, training=training)
/app/UnMicst1-5.py:136: UserWarning: `tf.layers.batch_normalization` is deprecated and will be removed in a future version. Please use `tf.keras.layers.BatchNormalization` instead. In particular, `tf.control_dependencies(tf.GraphKeys.UPDATE_OPS)` should not be used (consult the `tf.keras.layers.BatchNormalization` documentation).
  lbn = tf.nn.leaky_relu(tf.layers.batch_normalization(
/app/UnMicst1-5.py:139: UserWarning: `tf.layers.dropout` is deprecated and will be removed in a future version. Please use `tf.keras.layers.Dropout` instead.
  return tf.layers.dropout(lbn, 0.35, training=UNet2D.tfTraining)
/usr/local/lib/python3.8/dist-packages/keras/legacy_tf_layers/core.py:401: UserWarning: `layer.apply` is deprecated and will be removed in a future version. Please use `layer.__call__` method instead.
  return layer.apply(inputs, training=training)
/app/UnMicst1-5.py:199: UserWarning: `tf.layers.batch_normalization` is deprecated and will be removed in a future version. Please use `tf.keras.layers.BatchNormalization` instead. In particular, `tf.control_dependencies(tf.GraphKeys.UPDATE_OPS)` should not be used (consult the `tf.keras.layers.BatchNormalization` documentation).
  tf.layers.batch_normalization(tf.nn.conv2d(cc, luXWeights2, strides=[1, 1, 1, 1], padding='SAME'),
/app/UnMicst1-5.py:220: UserWarning: `tf.layers.batch_normalization` is deprecated and will be removed in a future version. Please use `tf.keras.layers.BatchNormalization` instead. In particular, `tf.control_dependencies(tf.GraphKeys.UPDATE_OPS)` should not be used (consult the `tf.keras.layers.BatchNormalization` documentation).
  return tf.layers.batch_normalization(
automatically choosing GPU
Using GPU 0
loading data
loading data
loading data
0.34
0.25
Model restored.
Using channel 1
Inference...
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/session.py", line 1380, in _do_call
    return fn(*args)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/session.py", line 1363, in _run_fn
    return self._call_tf_sessionrun(options, feed_dict, fetch_list,
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/session.py", line 1456, in _call_tf_sessionrun
    return tf_session.TF_SessionRun_wrapper(self._session, options, feed_dict,
tensorflow.python.framework.errors_impl.UnknownError: 2 root error(s) found.
  (0) UNKNOWN: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
     [[{{node downsampling/ld0/Conv2D}}]]
     [[Softmax/_123]]
  (1) UNKNOWN: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
     [[{{node downsampling/ld0/Conv2D}}]]
0 successful operations.
0 derived errors ignored.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/app/UnMicst1-5.py", line 848, in <module>
    PM = np.uint8(255 * UNet2D.singleImageInference(cells, 'accumulate',
  File "/app/UnMicst1-5.py", line 704, in singleImageInference
    output = UNet2D.Session.run(UNet2D.nn, feed_dict={UNet2D.tfData: batchData, UNet2D.tfTraining: 0})
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/session.py", line 970, in run
    result = self._run(None, fetches, feed_dict, options_ptr,
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/session.py", line 1193, in _run
    results = self._do_run(handle, final_targets, final_fetches,
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/session.py", line 1373, in _do_run
    return self._do_call(_run_fn, feeds, fetches, targets, options,
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/session.py", line 1399, in _do_call
    raise type(e)(node_def, op, message)  # pylint: disable=no-value-for-parameter
tensorflow.python.framework.errors_impl.UnknownError: 2 root error(s) found.
  (0) UNKNOWN: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
     [[node downsampling/ld0/Conv2D
 (defined at /app/UnMicst1-5.py:102)
]]
     [[Softmax/_123]]
  (1) UNKNOWN: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
     [[node downsampling/ld0/Conv2D
 (defined at /app/UnMicst1-5.py:102)
]]
0 successful operations.
0 derived errors ignored.

Errors may have originated from an input operation.
Input Source operations connected to node downsampling/ld0/Conv2D:
In[0] placeholders/data (defined at /app/UnMicst1-5.py:80)  
In[1] downsampling/ld0/kernelD0/read (defined at /app/UnMicst1-5.py:86)

Operation defined at: (most recent call last)
>>>   File "/app/UnMicst1-5.py", line 770, in <module>
>>>     UNet2D.singleImageInferenceSetup(modelPath, GPU, args.mean, args.std)
>>> 
>>>   File "/app/UnMicst1-5.py", line 660, in singleImageInferenceSetup
>>>     UNet2D.setupWithHP(hp)
>>> 
>>>   File "/app/UnMicst1-5.py", line 43, in setupWithHP
>>>     UNet2D.setup(hp['imSize'],
>>> 
>>>   File "/app/UnMicst1-5.py", line 150, in setup
>>>     dsX.append(down_samp_layer(dsX[i], i))
>>> 
>>>   File "/app/UnMicst1-5.py", line 102, in down_samp_layer
>>>     c00 = tf.nn.conv2d(data, ldXWeights1, strides=[1, 1, 1, 1], padding='SAME')
>>> 

Input Source operations connected to node downsampling/ld0/Conv2D:
In[0] placeholders/data (defined at /app/UnMicst1-5.py:80)  
In[1] downsampling/ld0/kernelD0/read (defined at /app/UnMicst1-5.py:86)

Operation defined at: (most recent call last)
>>>   File "/app/UnMicst1-5.py", line 770, in <module>
>>>     UNet2D.singleImageInferenceSetup(modelPath, GPU, args.mean, args.std)
>>> 
>>>   File "/app/UnMicst1-5.py", line 660, in singleImageInferenceSetup
>>>     UNet2D.setupWithHP(hp)
>>> 
>>>   File "/app/UnMicst1-5.py", line 43, in setupWithHP
>>>     UNet2D.setup(hp['imSize'],
>>> 
>>>   File "/app/UnMicst1-5.py", line 150, in setup
>>>     dsX.append(down_samp_layer(dsX[i], i))
>>> 
>>>   File "/app/UnMicst1-5.py", line 102, in down_samp_layer
>>>     c00 = tf.nn.conv2d(data, ldXWeights1, strides=[1, 1, 1, 1], padding='SAME')
>>> 

Original stack trace for 'downsampling/ld0/Conv2D':
  File "/app/UnMicst1-5.py", line 770, in <module>
    UNet2D.singleImageInferenceSetup(modelPath, GPU, args.mean, args.std)
  File "/app/UnMicst1-5.py", line 660, in singleImageInferenceSetup
    UNet2D.setupWithHP(hp)
  File "/app/UnMicst1-5.py", line 43, in setupWithHP
    UNet2D.setup(hp['imSize'],
  File "/app/UnMicst1-5.py", line 150, in setup
    dsX.append(down_samp_layer(dsX[i], i))
  File "/app/UnMicst1-5.py", line 102, in down_samp_layer
    c00 = tf.nn.conv2d(data, ldXWeights1, strides=[1, 1, 1, 1], padding='SAME')
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/util/traceback_utils.py", line 150, in error_handler
    return fn(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/util/dispatch.py", line 1096, in op_dispatch_handler
    return dispatch_target(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/nn_ops.py", line 2431, in conv2d
    return gen_nn_ops.conv2d(
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/gen_nn_ops.py", line 969, in conv2d
    _, _, _op, _outputs = _op_def_library._apply_op_helper(
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/framework/op_def_library.py", line 744, in _apply_op_helper
    op = g._create_op_internal(op_type_name, inputs, dtypes=None,
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/framework/ops.py", line 3697, in _create_op_internal
    ret = Operation(
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/framework/ops.py", line 2101, in __init__
    self._traceback = tf_stack.extract_stack_for_node(self._c_op)

Here is the .command.run file using the grep singularity work/*/*/.command.run command:

set +u; env - PATH="$PATH" ${TMP:+SINGULARITYENV_TMP="$TMP"} ${TMPDIR:+SINGULARITYENV_TMPDIR="$TMPDIR"} ${NXF_TASK_WORKDIR:+SINGULARITYENV_NXF_TASK_WORKDIR="$NXF_TASK_WORKDIR"} singularity exec --no-home --pid -B /localtmp/test2/work -C -H "$PWD" --nv /localtmp/test2/work/singularity/labsyspharm-unmicst-2.7.7.img /bin/bash -ue /localtmp/test2/work/5f/2e7acc7242e8aa232b177fa56be9da/.command.sh

I haven't had the time to try all three segmentation options. I'll get back to you on that when I'm done.

By the way, is there a way to limit the number of CPU cores used during the run if we were to not run it with the GPU?

Hi @SalimSoria,

I think the reason it wasn't finding your GPU with the custom.config is because of capitalization issues. singularity and docker should be all lowercase, while runOptions should have the O capitalized.

The GPU config profile does effectively the same thing, and based on what you shared, it looks like the GPU is now visible inside the container.

Unfortunately, the cuDNN errors are notoriously hard to debug. @clarenceyapp can chime in here, but the two most-common issues are 1) driver incompatibility, and 2) out-of-GPU-memory issues. Your driver version 550.54.15 is very new, while UnMicst is still based on a fairly old version of TensorFlow, so there could be some incompatibilities there. I don't suspect memory issues, simply because exemplar-001 is tiny and UnMicst already implements the standard suggestion for these types of errors.

I am curious to see whether you have similar issues with Mesmer and Cellpose, but if we were to debug UnMicst, the next step would be to launch the TensorFlow container that UnMicst is based on in an interactive session with:

singularity shell -C --nv docker://tensorflow/tensorflow:2.7.1-gpu

then once inside the container start a python shell and interactively type import tensorflow.compat.v1 as tf, followed by these commands: https://github.com/HMS-IDAC/UnMicst/blob/master/UnMicst1-5.py#L434-L438 to see if you can reproduce the cuDNN issue. From there, we would either try newer tensorflow container to identify whether it's a version compatibility issue or additional GPU config options to rule out out-of-memory issues.

To limit CPUs, it should just be a matter of adding --cpus 4 to singularity.runOptions.

What would be the best driver version to have for UnMicst? We have no problem changing to an older driver version if need be.

We followed the instructions to test UNMicst.

# singularity shell -C --nv docker://tensorflow/tensorflow:2.7.1-gpu
...
Singularity> python
Python 3.8.10 (default, Nov 26 2021, 20:14:08)
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow.compat.v1 as tf
>>> saver = tf.train.Saver()
WARNING:tensorflow:Saver is deprecated, please switch to tf.train.Checkpoint or tf.keras.Model.save_weights for training checkpoints. When executing eagerly variables do not necessarily have unique names, and so the variable.name-based lookups Saver performs are error-prone.
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/training/saver.py", line 899, in __init__
    raise RuntimeError(
RuntimeError: When eager execution is enabled, `var_list` must specify a list or dict of variables to save
>>> config = tf.ConfigProto()
>>> config.gpu_options.allow_growth = True
>>> config.allow_soft_placement = True
>>> sess = tf.Session(config=config)
2024-04-18 00:03:57.731050: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-04-18 00:04:04.536372: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 22209 MB memory:  -> device: 0, name: Tesla M40 24GB, pci bus id: 0000:82:00.0, compute capability: 5.2
>>>

^ @clarenceyapp This is probably a question for you.

HI @SalimSoria , There appears to be reports of incompatibility between tensorflow 2.7 and CUDA 12. I'm using CUDA 11.3.1. Also, just to confirm, cuDNN needs to be installed (I'm using version 8.2.1). Please let us know if that works.

This link refers to an old article but is relevant. It suggests using slightly older CUDA versions even. Anything from CUDA version 11 should work.

The bioinformatics department looked at the installed deps in labsyspharm-unmicst-2.7.7.img docker image and listed them as:

CUDA: 11.0 and 11.2 are installed but 11.2 is default cuDNN: libcudnn.so.8.1.0 tensorflow: 2.7.1

They then installed cuda-11.2 on the host so that it could install the matching kernel driver (Kernel Driver Version 460.27.04). It seems this may have fixed the issue since the GPU was used (based on the log), but the cuDNN error was still present. They also mentioned the CPU pegged during the unmicst step, and we were wondering if this step is CPU-bound even when using the GPU. Nonetheless, this run finished using both the CPU and GPU, and not really improving the time to completion

Here is the command log for the unmicst step:

WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/tensorflow/python/compat/v2_compat.py:111: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
Instructions for updating:
non-resource variables are not supported in the long term
Tue Apr 23 20:35:28 2024       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.27.04    Driver Version: 460.27.04    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla M40 24GB      Off  | 00000000:82:00.0 Off |                    0 |
| N/A   42C    P0    60W / 250W |      0MiB / 22945MiB |     97%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
2024-04-23 20:35:32.789820: E tensorflow/stream_executor/cuda/cuda_driver.cc:271] failed call to cuInit: CUDA_ERROR_UNKNOWN: unknown error
/app/UnMicst1-5.py:114: UserWarning: `tf.layers.batch_normalization` is deprecated and will be removed in a future version. Please use `tf.keras.layers.BatchNormalization` instead. In particular, `tf.control_dependencies(tf.GraphKeys.UPDATE_OPS)` should not be used (consult the `tf.keras.layers.BatchNormalization` documentation).
  bn = tf.nn.leaky_relu(tf.layers.batch_normalization(c00+shortcut, training=UNet2D.tfTraining))
/usr/local/lib/python3.8/dist-packages/keras/legacy_tf_layers/normalization.py:455: UserWarning: `layer.apply` is deprecated and will be removed in a future version. Please use `layer.__call__` method instead.
  return layer.apply(inputs, training=training)
/app/UnMicst1-5.py:136: UserWarning: `tf.layers.batch_normalization` is deprecated and will be removed in a future version. Please use `tf.keras.layers.BatchNormalization` instead. In particular, `tf.control_dependencies(tf.GraphKeys.UPDATE_OPS)` should not be used (consult the `tf.keras.layers.BatchNormalization` documentation).
  lbn = tf.nn.leaky_relu(tf.layers.batch_normalization(
/app/UnMicst1-5.py:139: UserWarning: `tf.layers.dropout` is deprecated and will be removed in a future version. Please use `tf.keras.layers.Dropout` instead.
  return tf.layers.dropout(lbn, 0.35, training=UNet2D.tfTraining)
/usr/local/lib/python3.8/dist-packages/keras/legacy_tf_layers/core.py:401: UserWarning: `layer.apply` is deprecated and will be removed in a future version. Please use `layer.__call__` method instead.
  return layer.apply(inputs, training=training)
/app/UnMicst1-5.py:199: UserWarning: `tf.layers.batch_normalization` is deprecated and will be removed in a future version. Please use `tf.keras.layers.BatchNormalization` instead. In particular, `tf.control_dependencies(tf.GraphKeys.UPDATE_OPS)` should not be used (consult the `tf.keras.layers.BatchNormalization` documentation).
  tf.layers.batch_normalization(tf.nn.conv2d(cc, luXWeights2, strides=[1, 1, 1, 1], padding='SAME'),
/app/UnMicst1-5.py:220: UserWarning: `tf.layers.batch_normalization` is deprecated and will be removed in a future version. Please use `tf.keras.layers.BatchNormalization` instead. In particular, `tf.control_dependencies(tf.GraphKeys.UPDATE_OPS)` should not be used (consult the `tf.keras.layers.BatchNormalization` documentation).
  return tf.layers.batch_normalization(
automatically choosing GPU
Using GPU 0
loading data
loading data
loading data
0.34
0.25
Model restored.
Using channel 1
Inference...
Inference...
Inference...

Hi @SalimSoria , I think that is looking better than before. With the exception of the failed call to cuInit: CUDA_ERROR_UNKNOWN: unknown error message, all other warning messages are expected when a GPU has been found. There is an optional image resizing step before image inference that might be CPU heavy but other than that, CPU loads should be small.

Can you let me know:

what is the size of your image in terms of number of pixels in the x and y dimensions
how long it is taking? Also, did you try running exemplar-001? UnMICST should take <30 seconds when using a consumer-grade GPU but approximately 1-2 minutes on CPU.

Hi @clarenceyapp

Sorry I didn't mention before. (1) The run I showed the command log for is using exemplar-002. (2) This run took approximately 16 minutes with the default params.yml (with the exception of -profile singularity,GPU). Before it had successfully detected the GPU, it took about 19 minutes to complete.

I'll check to see how much quicker running the UnMICST step with exemplar-001 is between using the GPU vs CPU.

@ArtemSokolov

Sorry, could you explain setting singularity.runOptions to --cpus 4? Is this something that can be set using the command singularity run [run options...] <container>? or can I set this within a custom params.yml?

Oh yea, sorry, this goes inside a custom.config:

singularity.runOptions = '-C -H "$PWD" --nv --cpus 4'

which you can supply to the pipeline with

nextflow run labsyspharm/mcmicro --in exemplar-001 -profile singularity,GPU -c custom.config

I realize that's params.yml vs. custom.config is a potential source of confusion. A good rule of thumb for Nextflow pipelines is:

params.yml controls the pipeline behavior (which modules to run, what parameters to pass to each, what versions of containers to pull, etc.)
custom.config controls how the pipeline is executed on any given infrastructure (whether or not to expose GPUs, how many CPUs/memory to allocate to each process, whether to trigger automatic restarts, etc.)

@ArtemSokolov I tried setting this into the custom.config and got this error Error for command "exec": unknown flag: --cpus followed with a list of possible arguments that don't include --cpus. However, I did see the option for --vm-cpu.

Here is the command.log

$ nextflow run labsyspharm/mcmicro --in ./exemplar-001 -c custom.config -profile singularity,GPU 
N E X T F L O W  ~  version 23.10.1
Launching `https://github.com/labsyspharm/mcmicro` [fabulous_aryabhata] DSL2 - revision: 69ee2efe21 [master]
executor >  local (1)
[-        ] process > illumination                -
[1b/cac782] process > registration:ashlar (1)     [  0%] 0 of 1
[-        ] process > background:backsub          -
[-        ] process > dearray:coreograph          -
[-        ] process > dearray:roadie:runTask      -
[-        ] process > segmentation:roadie:runTask -
[-        ] process > segmentation:worker         -
[-        ] process > segmentation:s3seg          -
[-        ] process > quantification:mcquant      -
[-        ] process > downstream:worker           -
[-        ] process > viz:autominerva             -
ERROR ~ Error executing process > 'registration:ashlar (1)'

Caused by:
  Process `registration:ashlar (1)` terminated with an error exit status (1)

Command executed:

  ashlar 'exemplar-001-cycle-06.ome.tiff' 'exemplar-001-cycle-07.ome.tiff' 'exemplar-001-cycle-08.ome.tiff'  -m 30 --ffexecutor >  local (1)
[-        ] process > illumination                -
[1b/cac782] process > registration:ashlar (1)     [100%] 1 of 1, failed: 1 ✘
[-        ] process > background:backsub          -
[-        ] process > dearray:coreograph          -
[-        ] process > dearray:roadie:runTask      -
[-        ] process > segmentation:roadie:runTask -
[-        ] process > segmentation:worker         -
[-        ] process > segmentation:s3seg          -
[-        ] process > quantification:mcquant      -
[-        ] process > downstream:worker           -
[-        ] process > viz:autominerva             -
ERROR ~ Error executing process > 'registration:ashlar (1)'

Caused by:
  Process `registration:ashlar (1)` terminated with an error exit status (1)

Command executed:

  ashlar 'exemplar-001-cycle-06.ome.tiff' 'exemplar-001-cycle-07.ome.tiff' 'exemplar-001-cycle-08.ome.tiff'  -m 30 --ffp exemplar-001-cycle-06-ffp.tif exemplar-001-cycle-07-ffp.tif exemplar-001-cycle-08-ffp.tif --dfp exemplar-001-cycle-06-dfp.tif exemplar-001-cycle-07-dfp.tif exemplar-001-cycle-08-dfp.tif -o exemplar-001.ome.tif

Command exit status:
  1

Command output:
  (empty)

Command error:
        --no-home                do NOT mount users home directory if home
                                 is not the current working directory
        --no-init                do NOT start shim process with --pid
        --no-nv                  
        --no-privs               drop all privileges from root user in container)
        --nohttps                do NOT use HTTPS with the docker://
                                 transport (useful for local docker
                                 registries without a certificate)
        --nonet                  disable VM network handling
        --nv                     enable experimental Nvidia support
    -o, --overlay strings        use an overlayFS image for persistent data
                                 storage or as read-only layer of container
        --passphrase             prompt for an encryption passphrase
        --pem-path string        enter an path to a PEM formated RSA key for
                                 an encrypted container
    -p, --pid                    run container in a new PID namespace
        --pwd string             initial working directory for payload
                                 process inside the container
        --rocm                   enable experimental Rocm support
    -S, --scratch strings        include a scratch directory within the
                                 container that is linked to a temporary dir
                                 (use -W to force location)
        --security strings       enable security features (SELinux,
                                 Apparmor, Seccomp)
    -u, --userns                 run container in a new user namespace,
                                 allowing Singularity to run completely
                                 unprivileged on recent kernels. This
                                 disables some features of Singularity, for
                                 example it only works with sandbox images.
        --uts                    run container in a new UTS namespace
        --vm                     enable VM support
        --vm-cpu string          number of CPU cores to allocate to Virtual
                                 Machine (implies --vm) (default "1")
        --vm-err                 enable attaching stderr from VM
        --vm-ip string           IP Address to assign for container usage.
                                 Defaults to DHCP within bridge network.
                                 (default "dhcp")
        --vm-ram string          amount of RAM in MiB to allocate to Virtual
                                 Machine (implies --vm) (default "1024")
    -W, --workdir string         working directory to be used for /tmp,
                                 /var/tmp and $HOME (if -c/--contain was
                                 also used)
    -w, --writable               by default all Singularity containers are
                                 available as read only. This option makes
                                 the file system accessible as read/write.
        --writable-tmpfs         makes the file system accessible as
                                 read-write with non persistent data (with
                                 overlay support only)

  Run 'singularity exec --help' for more detailed usage information.

You may need to chase down the best way to do it for your Singularity distribution. It seems that --cpus option may only be available in the CE (community edition) distribution: https://docs.sylabs.io/guides/main/user-guide/cgroups.html#cpu-limits

Looks like the apptainer distribution (which is likely what you have) uses a completely different configuration method: https://apptainer.org/user-docs/master/cgroups.html#limiting-container-resources-with-cgroups

The challenge with this method is making cgroups.toml visible inside the container. One option is to put cgroups.toml in a fixed location on your system and then mount that location to every container. So, something like this:

singularity.runOptions = '-C -H "$PWD" --nv -B /path/to/fixed/loc --apply-cgroups /path/to/fixed/loc/cgroups.toml'

--vm-cpu doesn't sound right to me, because it says the default is "1", but it sounds like your runs are using more than that? But maybe it's worth a try to see if this option works.

You can also try telling Nextflow that you only want to use 4 CPUs for each process, and see if it can figure out what to do. This is also done in the custom.config:

process.cpus = 4
singularity.runOptions = '-C -H "$PWD" --nv'

We found a workaround using cgroups. We were able to complete both exemplar-001 and exemplar-002 while also limiting the number of available CPU cores.

However, I may need to revisit this ticket in the future since there are plans to upgrade some of the server's components including its GPU. Thanks again!

My next issue is getting MCMICRO to work on my images which I'll submit a ticket for.

Great to hear you got it working. I will close the issue for now, but feel free to reopen if/when you have follow up issues.

labsyspharm / mcmicro

Using GPU resources for Unmisct in MCMICRO #546