mit-han-lab / efficientvit

EfficientViT is a new family of vision models for efficient high-resolution vision.
Apache License 2.0
1.59k stars 141 forks source link

The output from c++_trt infer is not same as the py_trt when i set two or more points to segment #93

Closed yangchengxin closed 3 months ago

yangchengxin commented 3 months ago

when i set two points to segment the cat.jpg in python demo, i can get the output like this efficientvit_sam_demo_tensorrt

however, when i set two same points to segment in my c++ inference demo, it only gives me one segmentation, like this: image

in my c++ inference demo, i define the two points in vector, and after processd by function of apply_coords, i defined the point_coords input and pointlabels input as float[], like :float points[1][2][2] and float labels_[1][1][2], and copy the host data to device data to infer, but the outcome mask which only has one segmentation. I guess that way of points' definition is wrong in my c++ inference demo, but i am not sure for this.

I am looking forward to get some suggestions from you, best wishes for you!

xiangw369 commented 3 months ago

I'm very glad that you have done such work. Can you share the relevant C++ reasoning code?