Open sulaimanvesal opened 11 months ago
One more question, the CPU version on a core-i7 with an input size of 1024x512 is quite slow. FastSAM-S (ultralytics) on the same machine and input size has an inference time around 400ms.
Inference using: efficientsam_ti_cpu.jit
Input size: torch.Size([3, 512, 1024])
Preprocess Time: 79.8783 ms
Inference Time: 6939.1549 ms
@klightz, can you help @sulaimanvesal for taking multi bounding boxes to the model as prompt.
@sulaimanvesal, For EfficientSAM, we resize the input image to the size of 1024x1024 for model input. The preprocess and the postprocess are all included in the torchscript model. You also need to include that for FastSAM-S. Actually the demo we hosted on our server now is using cpu, Intel(R) Xeon(R) Platinum 8339HC CPU @ 1.80GHz, which seems not that slow even for efficientsam_s_cpu.jit.
@yformer thanks for the reply. @klightz would you please let us know to perform multi-bounding boxes as prompt? similar to FastSAM?
hi @yformer
Any update on how to running multi bounding boxes? thank you.
@balakv504, can you provide one example for using multi-bounding boxes as prompt?
The input_point to the model has shape [batch_size, num_masks, num_points, 2]. For multi bounding box, you feed in a tensor of shape [1, num_bounding_boxes, 2, 2] (assuming you are querying one image). For EfficientSAM, the encoder will be run only once and decoder is batched inference. Happy to provide an example in the colab if you have issues using this API.
Thanks @balakrishnanv ! it would be great to see an example. It would be good not only for my case but for many others.
I met the same issue and I find one example in Grounded-Segment-Anything repo. here.
They set batched_points
in [B,num_box,2,2]
, and batched_points_labels
in [B,num_box,2]
. One box points label is set to 2
, while the other is 3
.
But I don't understand how to decide the batched_points_labels
here.
I just find the related code.
So, for bounding box, we can just set the label to [2,3]
, similar to the example in Grounded-SAM. It should work.
We will add an example for multiple bbox inference soon. Thanks for your patience. @glennliu Yes that is correct. Thanks for pulling that out.
@yformer I am pinging in case any of the authors made an effort to provide a simple example of multi-bboxes. I know it's not that hard!
Thanks for sharing this repo.
On the demo collab file, how we can pass multi bounding boxes to the model as prompt?
I have a widget which gets the bboxs by the users, and I want to pass it to the model like FastSAM.