Open kritohyh opened 2 years ago
May be the delay caused by copy operation with torch.gather?
Hi, thanks for your interest in our work!
Yes, compared to vanilla ViTs, EViT has the gather
and topk
operations that require additional GPU kernel launch, whose computational overhead would be non-negligible when the batch size is 1.
Hi, thanks for your interest in our work!
Yes, compared to vanilla ViTs, EViT has the
gather
andtopk
operations that require additional GPU kernel launch, whose computational overhead would be non-negligible when the batch size is 1.
That said, deploying EViT for real-time infering on video streams may require overcoming the overhead of additional operations. Is it possible to use trt or custom operator to solve this problem? I'd like your advice!
The overhead may be caused by the complement_idx
in helpers.py
. I will check it soon.
For your use case in video streams, can't the video be viewed as a series of images so that the batch size is greater than 1?
The overhead may be caused by the
complement_idx
inhelpers.py
. I will check it soon.For your use case in video streams, can't the video be viewed as a series of images so that the batch size is greater than 1?
Looking forward to your good results! In my scenario, due to the non-dense frame extraction, I need to test the inference speed of single frame. But because my image resolution is large enough that the images are partitioned into large batchsizes, EViT still delivers its high performance.
Your research is very meaningful. But when I turn batchsize down, why doesn't EVit perform so well? I hope you can dispel my doubts.