OOM error when use for loop

anxueren8 commented 3 years ago

I have tried to use the suggestion codel as follows to calculate the distance between two faces.

model_name = "Facenet"
model = DeepFace.build_model(model_name)
DeepFace.verify("img1.jpg", "img2.jpg", model_name = model_name, model = model)

But I got the error after 1117/206040 iterations:

ResourceExhaustedError: OOM when allocating tensor with shape[1,613,613,10] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [[{{node p_re_lu_42385/Relu_1-0-0-TransposeNCHWToNHWC-LayoutOptimizer}}]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. [Op:__inference_keras_scratch_graph_3394553]

Function call stack: keras_scratch_graph

I have four 2080 Ti on my server, but it looks like it just uses the first GPU with full memory used. So I am wondering whether the memory of GPU causes this problem? Any suggestions will be appreciated.

The code as follows:

model_name = "ArcFace"
model = DeepFace.build_model(model_name)

pairs_distance_lists = []

for j in range(len(pairs_name_dict)):

    print('{now}/{total}'.format(now = j, total = len(pairs_name_dict)))

    path1 = pairs_dict[j][0] # the path of frist image  e.g './Morphed/001_03_R_morphed_0.41_002_03_R.png'
    path2 = pairs_dict[j][1] # the path of second image e.g '../datasets/AMSL/londondb_genuine_neutral_passport-scale_15kb/001_03_q30.jpg'

    result = DeepFace.verify(path1, path2, model_name=model_name, model=model,\
                              distance_metric = "cosine", detector_backend = 'mtcnn')

    # extract distance from result
    a1 = pairs_name_dict[j][0]
    a2 = pairs_name_dict[j][1]
    a3 = result['distance']
    pairs_distance = [a1, a2, a3]
    print(pairs_distance)
    pairs_distance_lists.append(pairs_distance)

Thanks

serengil commented 3 years ago

Could you share the whole code? I could not see the for loop

anxueren8 commented 3 years ago

Thank you for your quick reply. I updated the code.

serengil commented 3 years ago

That's memory based error. But you do everything based on the best practice approaches. Unfortunately, I do not have a gpu and test this case.

Please read this post: https://sefiks.com/2019/03/20/tips-and-tricks-for-gpu-and-multiprocessing-in-tensorflow/

1- Try to set all devices. This should be defined before importing deepface.

import os os.environ["CUDA_VISIBLE_DEVICES"]="0,1,2,3"

2- Disable allocating all memory. Let it to allocate how much it needs.

config = tf.ConfigProto() config.gpu_options.allow_growth = True session = tf.Session(config=config) keras.backend.set_session(session)

3- change detector_backend to opencv. that causes same error?

anxueren8 commented 3 years ago

That's memory based error. But you do everything based on the best practice approaches. Unfortunately, I do not have a gpu and test this case.

Please read this post: https://sefiks.com/2019/03/20/tips-and-tricks-for-gpu-and-multiprocessing-in-tensorflow/

1- Try to set all devices. This should be defined before importing deepface.

import os os.environ["CUDA_VISIBLE_DEVICES"]="0,1,2,3"

2- Disable allocating all memory. Let it to allocate how much it needs.

config = tf.ConfigProto() config.gpu_options.allow_growth = True session = tf.Session(config=config) keras.backend.set_session(session)

3- change detector_backend to opencv. that causes same error?

Thank you for your hep. It works with 'dlib' not 'opencv'. Do you have any idea why 'mtcnn' doesn't work? It is beacuse mtcnn needs more memory in GPU?

serengil commented 3 years ago

mtcnn is a deep learning model and it seems that it allocates your gpu memory.

serengil commented 3 years ago

could you try the following snippet?

model_name = "ArcFace"
model = DeepFace.build_model(model_name)

result = DeepFace.verify(pairs_dict, model_name=model_name, model=model,\
                              distance_metric = "cosine", detector_backend = 'mtcnn')

anxueren8 commented 3 years ago

could you try the following snippet?

model_name = "ArcFace"
model = DeepFace.build_model(model_name)

result = DeepFace.verify(pairs_dict, model_name=model_name, model=model,\
                              distance_metric = "cosine", detector_backend = 'mtcnn')

I have tried the structure as follows and it works with mtcnn.

[['./Morphed_alpha05/001_03_R_morphed_002_03_R.png',
  './AMSL_FaceMorphImageDataSet/londondb_genuine_neutral_passport-scale_15kb/001_03_q30.jpg'],
 ['./Morphed_alpha05/001_03_R_morphed_002_03_R.png',
  './AMSL_FaceMorphImageDataSet/londondb_genuine_neutral_passport-scale_15kb/002_03_q55.jpg'],
 ['./Morphed_alpha05/001_03_R_morphed_003_03_R.png',
  './AMSL_FaceMorphImageDataSet/londondb_genuine_neutral_passport-scale_15kb/001_03_q30.jpg'],
 ['./Morphed_alpha05/001_03_R_morphed_003_03_R.png',
  './AMSL_FaceMorphImageDataSet/londondb_genuine_neutral_passport-scale_15kb/003_03_q43.jpg'],
 ['./Morphed_alpha05/001_03_R_morphed_004_03_R.png',
  './AMSL_FaceMorphImageDataSet/londondb_genuine_neutral_passport-scale_15kb/001_03_q30.jpg'],
 ['./Morphed_alpha05/001_03_R_morphed_004_03_R.png',
  './AMSL_FaceMorphImageDataSet/londondb_genuine_neutral_passport-scale_15kb/004_03_q51.jpg'],
 ['./Morphed_alpha05/001_03_R_morphed_005_03_R.png',
  './AMSL_FaceMorphImageDataSet/londondb_genuine_neutral_passport-scale_15kb/001_03_q30.jpg'],
 ['./Morphed_alpha05/001_03_R_morphed_005_03_R.png',
  './AMSL_FaceMorphImageDataSet/londondb_genuine_neutral_passport-scale_15kb/005_03_q55.jpg'],
 ['./Morphed_alpha05/001_03_R_morphed_007_03_R.png',
  './AMSL_FaceMorphImageDataSet/londondb_genuine_neutral_passport-scale_15kb/001_03_q30.jpg'],
 ['./Morphed_alpha05/001_03_R_morphed_007_03_R.png',
  './AMSL_FaceMorphImageDataSet/londondb_genuine_neutral_passport-scale_15kb/007_03_q45.jpg']]

But the thing the I need to adjust different factors, so I put the path of 40 pairs in one dictionary, then the structure becomes the follows. I don't know whether it still works. I can try after the previous running. It looks like it needs few days to finish 200k loops.

{'001_002': [['./Morphed/001_03_R_morphed_0.41_002_03_R.png',
   '../datasets/AMSL/londondb_genuine_neutral_passport-scale_15kb/001_03_q30.jpg'],
  ['./Morphed/001_03_R_morphed_0.41_002_03_R.png',
   '../datasets/AMSL/londondb_genuine_neutral_passport-scale_15kb/002_03_q55.jpg'],
  ['./Morphed/001_03_R_morphed_0.42_002_03_R.png',
   '../datasets/AMSL/londondb_genuine_neutral_passport-scale_15kb/001_03_q30.jpg'],
  ['./Morphed/001_03_R_morphed_0.42_002_03_R.png',
   '../datasets/AMSL/londondb_genuine_neutral_passport-scale_15kb/002_03_q55.jpg'],
  ['./Morphed/001_03_R_morphed_0.43_002_03_R.png',
   '../datasets/AMSL/londondb_genuine_neutral_passport-scale_15kb/001_03_q30.jpg'],
  ['./Morphed/001_03_R_morphed_0.43_002_03_R.png',
   '../datasets/AMSL/londondb_genuine_neutral_passport-scale_15kb/002_03_q55.jpg'],

serengil commented 3 years ago

you might run in parallel to speed it up.

i am closing the issue it nothing else?

anxueren8 commented 3 years ago

you might run in parallel to speed it up.

i am closing the issue it nothing else?

Yes, thank you for your help.

serengil commented 3 years ago

I just published deepface 0.0.60. Many production-driven performance issues are handled in this release. Please update the package and re-try.

serengil / deepface

OOM error when use for loop #244