pengsongyou / openscene

[CVPR'23] OpenScene: 3D Scene Understanding with Open Vocabularies
https://pengsongyou.github.io/openscene
Apache License 2.0
656 stars 46 forks source link

About LSeg feature #7

Closed Coobiw closed 1 year ago

Coobiw commented 1 year ago

Hello, thanks for this great job! Now, I am doing some work with LSeg feature. But I notice some question. For LSeg, there are some layers for refinement after the computation of cosine similarity with text feature,like:

image

Due to the fact that, in OpenScene distillation, you don't introduce the text feature, I guess that you just use the variable named 'image_features' here,isn't it?(ignore the part I mark in the figure above). Thanks for your reply!

Coobiw commented 1 year ago

Oh,Additionally,the scannet_2d input is (320,240). LSeg has 4 scales to fusion,which is :

image

then upsample layer4 x2,the shape will be[B,C,20,16] (cannot add with layer3 [B,C,20,15]) How did you solve this problem? Thanks!