sentinel-hub / field-delineation

Field delineation with Sentinel-2 data from Sentinel-Hub and a ResUnet-a architecture.
MIT License
149 stars 53 forks source link

Running on GPU - memory error #12

Closed SFrav closed 2 years ago

SFrav commented 2 years ago

Hi there,

Have you run your workflow on GPUs? If so, could you share package versions and any code revisions?

We have tried on Titan X and Tesla K80 GPUs (both x4 GPUs). Unfortunately we get the following error:

F tensorflow/stream_executor/lib/statusor.cc:34] Attempting to fetch value instead of handling error Internal: failed initializing StreamExecutor for CUDA device ordinal 1: Internal: failed call to cuDevicePrimaryCtxRetain: CUDA_ERROR_OUT_OF_MEMORY: out of memory; total memory reported:

We have tried using tensorflow-gpu versions 2.0, 2.1, 2.2 & 2,4 via conda. We are also restricting TF memory allocation and allowing dynamic allocation - see code below.

import tensorflow as tf
#Limit initial GPU memory allocation
tf.config.gpu.set_per_process_memory_fraction(0.1)
#tf.compat.v1.GPUOptions(per_process_gpu_memory_fraction=0.4)
#Dynamic GPU memory allocation approach 1
cfg = tf.ConfigProto() 
cfg.gpu_options.allow_growth = True
with tf.Session(config=cfg) as sess:
    train_k_folds(train_k_folds_config)

Any pointers would be appreciated.

devisperessutti commented 2 years ago

Hi @SFrav,

We do typically run the code on a single GPU with 16GB RAM (on EC2 instances). Did you modify perhaps the batch_size in the config file? It should stay to 1 if I remember, otherwise it might try parallelizing over EOPatches.

I can provide you with the versions of our working environment if needed.

SFrav commented 2 years ago

Thank you. Reducing batch_size to 1 and running on 1 GPU addressed that error.

Now it does progress to training the model but throws an error that's related to the loss function and metrics in tf 2.x. See below

tensorflow.python.framework.errors_impl.OperatorNotAllowedInGraphError: using a `tf.Tensor` as a Python `bool` is not allowed in Graph execution. Use Eager execution or decorate this function with @tf.function.

Following the first answer to this post I edited training.py as follows but my hasty hack didn't work.

model.net.compile(
    loss={'extent':TanimotoDistanceLoss(from_logits=False),
          'boundary':TanimotoDistanceLoss(from_logits=False),
          'distance':TanimotoDistanceLoss(from_logits=False)},
    optimizer=tf.keras.optimizers.Adam(
        learning_rate=config.model_config['learning_rate']),
    # comment out the metrics you don't care about
    metrics=[tf.keras.metrics.MeanIoU(num_classes=config.n_classes, *args, **kwargs)])

Are more code revisions needed to run on tf 2.x with GPU?

devisperessutti commented 2 years ago

We have tested and recently run this code successfully without any modification on TF 2.4 and newer, so should work for you as well.

Are you trying to train or run inference?

Both losses and metrics are initiliased in our training code here, so it might be something else. Could you try using only the accuracy metric and see if there is any difference?

For inference the metrics are computed separately, and the compilation can be left without arguments, as shown here.

SFrav commented 2 years ago

@devisperessutti, this error occurs at the training stage.

I just tried excluding the mean IoU metric but it still gives the same error. We get this error using the Tesla K80. The original error is still thrown when using a Titan X.

Out of curiosity, what's the processing speed-up from using a GPU as opposed to CPUs?

devisperessutti commented 2 years ago

This is the TF versions we are using

tensorflow                         2.6.2
tensorflow-addons                  0.14.0
tensorflow-estimator               2.6.0

on an NVIDIA T4 GPU with CUDA 10.1. I would give a try with this versions just for reference, although should work with other TF versions compatible with your GPU driver/CUDA version.

Did you perhaps modify the size of the patchlets? I would have thought you could fit more than one sample into 12GB of GPU RAM.

Comparison between CPU and GPU depends on GPU specs (number of flops), but I'd say roughly an order of magnitude faster. If you wanted to try on a couple of samples to test the code it is feasible to run on CPU though, but for training on large number of samples it would take a very long time.

Hope this helps.

SFrav commented 2 years ago

Very helpful. Thank you.

Using those exact versions with tensorflow-gpu==2.4 the model starts to run. I also have to use this at the start to make sure only one GPU is used: CUDA_VISIBLE_DEVICES=1 python ....py

There is another error in training related to the tensor conversion function, but that doesn't seem to be GPU related.

I'll close - with thanks!