Closed Coobiw closed 1 year ago
Oh,Additionally,the scannet_2d input is (320,240). LSeg has 4 scales to fusion,which is :
then upsample layer4 x2,the shape will be[B,C,20,16] (cannot add with layer3 [B,C,20,15]) How did you solve this problem? Thanks!
Hello, thanks for this great job! Now, I am doing some work with LSeg feature. But I notice some question. For LSeg, there are some layers for refinement after the computation of cosine similarity with text feature,like:
Due to the fact that, in OpenScene distillation, you don't introduce the text feature, I guess that you just use the variable named 'image_features' here,isn't it?(ignore the part I mark in the figure above). Thanks for your reply!