tudelft / idnet

13 stars 3 forks source link

Question about metrics #3

Open JamesYang110043 opened 1 month ago

JamesYang110043 commented 1 month ago

Hi @yilun-wu Thank you for your nice work. I am curious about how your metrics (Memory usage / Runtime / Latency) are calculated. Can you release the metrics source code ?

yilun-wu commented 1 month ago

Hi @JamesYang110043

Re: Memory Usage - we used torchinfo as a guideline for memory usage calculation. However, it’s important to note that the “forward/backward pass size” reported by torchinfo represents the cumulative memory usage of all layers combined. During inference, only the memory requirement of the largest single layer is incurred, as layers are executed sequentially. Additionally, torchinfo is limited to Conv layers. Therefore, for transformer layers and correlation volumes, we performed the calculations manually.

Re: Runtime / latency - the following snippet (insert at the end of https://github.com/tudelft/idnet/blob/master/idn/model/idedeq.py) can be used to get similar numbers as reported in the paper:

if __name__ == '__main__':
    torch.backends.cudnn.benchmark = True
    import time
    initialize(config_path="../config")
    config = compose(config_name="id_eval")

    print(config.model)
    model = IDEDEQIDO(config.model)
    x = {
        "event_volume_new": torch.rand(1, 15, 480, 640).cuda()
    }
    model.cuda()
    with torch.no_grad():
        # call first to allocate/reserve all required memory on gpu
        model(x)
        for _ in range(5):
            x = {
                "event_volume_new": torch.rand(1, 15, 480, 640).cuda()
            }
            start_time = time.time()
            model(x)
            end_time = time.time()
            print("Time: ", end_time-start_time)

Adjust accordingly for different networks, but the concept is the same: we measure the inference time for the network to process a single voxel grid of size 15 x 480 x 640. Since IDNet utilize a RNN, the latency is measured to be the inference time of the last bin, hence it's roughly 1/15th (1/# of bins in voxel grid) of the runtime.

Let me know if you have further questions.

Yilun