mit-han-lab / efficientvit

EfficientViT is a new family of vision models for efficient high-resolution vision.
Apache License 2.0
1.59k stars 141 forks source link

Fix tensorrt inference to take multiple boxes and points #77

Closed xuanlinli17 closed 3 months ago

xuanlinli17 commented 3 months ago

Fix deployment/sam/tensorrt/inference.py and deployment/sam/tensorrt/inferencer.py

zhuoyang20 commented 3 months ago

Hi @xuanlinli17,

Thank you for bringing up the issue. We truly appreciate your efforts in helping us solve it!

After carefully reviewing your pull request, I noticed that there might be a misunderstanding regarding the input format of points. Specifically, for ONNX/TensorRT inference, the default image batch size is 1. As for args.point, it should be in the shape [B, N, 3], where B represents the number of output masks (not the image batch size), N denotes the number of prompt points you provide for each mask (pad the same point if it is smaller than N), and 3 refers to the coordinates and the label in format (x, y, label).

I have made the necessary updates to the code to address the issue. Your pull request is really helpful for my modifications. You can try it out whether it works for you now. Should you have any further questions, please don't hesitate to reach out. Once again, thank you for your valuable contribution!

Best, Zhuoyang

xuanlinli17 commented 3 months ago

It works! Thanks a lot!