nixingyang / AdaptiveL2Regularization

[ICPR 2020] Adaptive L2 Regularization in Person Re-Identification
https://ieeexplore.ieee.org/document/9412481
MIT License
64 stars 23 forks source link

Running on realtime #14

Closed b21627193 closed 3 years ago

b21627193 commented 3 years ago

Hello,

Firstly, thank you for your great work.

I have a simple problem. "predict" function takes too much time (1-2 sec) even with the "use_horizontal_flipping_in_evaluation=False" option. I couldn't get why this problem occurs.

Thanks in advance.

nixingyang commented 3 years ago

Hi, This issue should not happen. The models in this work are trained with the ResNet backbone. Performing inference should be quite fast with a decent GPU. I suggest checking the logs to see whether there are warning/error messages. Alternatively, you may measure the inference speed of running the original image classification model. All the best. Xingyang

b21627193 commented 3 years ago

Actually, I use Yolo for detection and I import the solution.py(I can't see logs). The image goes to a function in solution for comparison with the other images that is stored in my directory. For an example, I have GTX 1050 and it takes 0.6 second to compare an image with 32 other images in directory. Is this performace OK ?

The other case is that I want to use your code to make binary prediction. To do that, I use a threshold value after computed distance matrix. But in the video, image's size is not fixed(person may move away or get closer to camera) everytime and this situation causes problems while inferencing. Distance matrix value goes higher. Do you have any suggestion for implementation ?

Thank you.

nixingyang commented 3 years ago

For your reference, one Tesla P100 12GB could process 137.17 images per second with the ResNet50 backbone and use_horizontal_flipping_in_evaluation set to True. This number is reported in the evaluation script. On the other hand, the time used in calculating the cosine distances is neglectable since your gallery size (i.e., 32) is very small.

For your second question, my suggestion is to use the straining forward method, i.e., directly resizing the cropped images into the target resolution. This would be consistent with the training procedure. Alternatively, you could modify the model definition so that it accepts input with any size. The shape of the input tensor is (1, None, None, 3). The disadvantage is that you could not accumulate samples with different sizes into the same batch, and inference speed would drop significantly.

b21627193 commented 3 years ago

For your second question, my suggestion is to use the straining forward method, i.e., directly resizing the cropped images into the target resolution.

The problem is here. "preprocess_input" function is resizing the image. So, the small image gets bigger, large image gets smaller and this confuses the model. I couldn't find the correct way to compare two images that belong to the same person with different size. I don't think I can use low batch size because of the performance issues. For another, do you have a threshold value calculated before ?

Followed way to inference:

Thank you for your time. I really got stuck.

nixingyang commented 3 years ago

Kindly check my reply here. Unfortunately, there is no quick fix. You may also check previous works that explicitly handle this issue, such as Resolution-invariant Person Re-Identification. Xingyang

mertcannkocerr commented 3 years ago

Hello I am also facing the same problem. Therefore, it made me very happy to find this problem here. I also use Nvidia GTX 1050 as GPU. Below I will send you the code block and the warnings I received.

`
if evaluation_only: print("Freezing the whole model in the evaluation_only mode ...") training_model.trainable = False training_model.compile(**training_model.compile_kwargs)

    dataset_folder_path = '/home/mertcan/Desktop/checkPath'

    image_file_path_list = [
        os.path.join(dataset_folder_path, item) for item in [
            "1.jpg",
            "2.jpg",
            "1.jpg"
        ]
    ]

    image_content_list = [
        read_image_file(image_file_path, input_shape)
        for image_file_path in image_file_path_list
    ]

    image_content_array = preprocess_input(np.array(image_content_list))
    start = time.time()
    first, second, third = inference_model.predict(image_content_array,
                                                   batch_size=batch_size)

    end = time.time()
    print('TIME :', end - start, 'batch : ', batch_size)
    distance_matrix = test_evaluator_callback.compute_distance_matrix(
        query_image_features=[first],
        gallery_image_features=[second, third],
        metric="cosine",
        use_re_ranking=use_re_ranking)`

In this code block, the applied distance matrix finding process is performed just as you have explained in other issues. However, I realized that I was getting warnings like these towards the end of the program.

2021-04-23 22:03:45.228958: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7 2021-04-23 22:03:45.916604: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10 2021-04-23 22:03:46.098630: W tensorflow/core/common_runtime/bfc_allocator.cc:311] Garbage collection: deallocate free memory regions (i.e., allocations) so that we can re-allocate a larger region to avoid OOM due to memory fragmentation. If you see this message frequently, you are running near the threshold of the available device memory and re-allocation may incur great performance overhead. You may try smaller batch sizes to observe the performance impact. Set TF_ENABLE_GPU_GARBAGE_COLLECTION=false if you'd like to disable this feature. 2021-04-23 22:03:46.132122: I tensorflow/stream_executor/cuda/cuda_driver.cc:763] failed to allocate 2.08G (2234974208 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2021-04-23 22:03:46.132188: W tensorflow/core/common_runtime/bfc_allocator.cc:245] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.05GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2021-04-23 22:03:46.146516: I tensorflow/stream_executor/cuda/cuda_driver.cc:763] failed to allocate 2.08G (2234974208 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2021-04-23 22:03:46.146545: W tensorflow/core/common_runtime/bfc_allocator.cc:245] Allocator (GPU_0_bfc) ran out of memory trying to allocate 544.50MiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2021-04-23 22:03:46.157406: I tensorflow/stream_executor/cuda/cuda_driver.cc:763] failed to allocate 2.08G (2234974208 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2021-04-23 22:03:46.157436: W tensorflow/core/common_runtime/bfc_allocator.cc:245] Allocator (GPU_0_bfc) ran out of memory trying to allocate 544.50MiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2021-04-23 22:03:46.175191: I tensorflow/stream_executor/cuda/cuda_driver.cc:763] failed to allocate 2.08G (2234974208 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2021-04-23 22:03:46.175219: W tensorflow/core/common_runtime/bfc_allocator.cc:245] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.08GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2021-04-23 22:03:46.187259: I tensorflow/stream_executor/cuda/cuda_driver.cc:763] failed to allocate 2.08G (2234974208 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2021-04-23 22:03:46.187287: W tensorflow/core/common_runtime/bfc_allocator.cc:245] Allocator (GPU_0_bfc) ran out of memory trying to allocate 566.42MiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2021-04-23 22:03:46.193800: I tensorflow/stream_executor/cuda/cuda_driver.cc:763] failed to allocate 2.08G (2234974208 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2021-04-23 22:03:46.193839: W tensorflow/core/common_runtime/bfc_allocator.cc:245] Allocator (GPU_0_bfc) ran out of memory trying to allocate 155.23MiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2021-04-23 22:03:46.202975: I tensorflow/stream_executor/cuda/cuda_driver.cc:763] failed to allocate 2.08G (2234974208 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2021-04-23 22:03:46.203007: W tensorflow/core/common_runtime/bfc_allocator.cc:245] Allocator (GPU_0_bfc) ran out of memory trying to allocate 548.69MiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2021-04-23 22:03:46.221697: W tensorflow/core/common_runtime/bfc_allocator.cc:245] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.08GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2021-04-23 22:03:46.232656: W tensorflow/core/common_runtime/bfc_allocator.cc:245] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.09GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2021-04-23 22:03:46.243071: W tensorflow/core/common_runtime/bfc_allocator.cc:245] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.15GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2021-04-23 22:03:46.249584: I tensorflow/stream_executor/cuda/cuda_driver.cc:763] failed to allocate 2.08G (2234974208 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2021-04-23 22:03:46.256152: I tensorflow/stream_executor/cuda/cuda_driver.cc:763] failed to allocate 2.08G (2234974208 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2021-04-23 22:03:46.258782: I tensorflow/stream_executor/cuda/cuda_driver.cc:763] failed to allocate 2.08G (2234974208 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2021-04-23 22:03:46.294945: I tensorflow/stream_executor/cuda/cuda_driver.cc:763] failed to allocate 2.08G (2234974208 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2021-04-23 22:03:46.322761: I tensorflow/stream_executor/cuda/cuda_driver.cc:763] failed to allocate 2.08G (2234974208 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2021-04-23 22:03:46.344502: I tensorflow/stream_executor/cuda/cuda_driver.cc:763] failed to allocate 2.08G (2234974208 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2021-04-23 22:03:46.353743: I tensorflow/stream_executor/cuda/cuda_driver.cc:763] failed to allocate 2.08G (2234974208 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2021-04-23 22:03:46.404893: I tensorflow/stream_executor/cuda/cuda_driver.cc:763] failed to allocate 2.08G (2234974208 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2021-04-23 22:03:46.405452: I tensorflow/stream_executor/cuda/cuda_driver.cc:763] failed to allocate 2.08G (2234974208 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory TIME : 2.737222671508789 batch : 64

I think cause of problem is 4GB memory of my NVIDIA card. It does not seem enough. Do you think so? Note: The TIME value below indicates only the time elapsed in the .predict method.

nixingyang commented 3 years ago

@mertcannkocerr The snippet looks correct. The out-of-memory error is due to the small memory size of the GPU, and the snippet actually runs on the CPU instead due to this error. You could either use models trained with smaller backbones or get better GPUs. Note that ResNet50 is the smallest backbone in the current implementation. I have an ongoing private project which supports more backbones and achieves better performance. I will keep you updated once it is published. Xingyang

mertcannkocerr commented 3 years ago

Thank you very much for writing such fast and informative messages. This is one of the most important reasons we use this model.

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] commented 3 years ago

Closing as stale. Please reopen if you'd like to work on this further.