Closed Zacchaeus00 closed 2 years ago
Hi @Zacchaeus14 ,
Thanks for your feedback!
Actually, this has been discussed in our paper section 5.1. It seems like you didn't input all the labels in the model. In this way, assume the model will be biased towards the label that might be related in the text embedding space (such as 'road' and 'car'). I guess the result will turn out to be better if you input the label 'road'.
Also actually, we don't train with 'other' too much, but we are a little surprised by LSeg's good generalizability on 'others'. However, as has been mentioned, LSeg provides efficient multimodal modeling and would like to provide insights for more brilliant ideas and works that are inspired or on the basis of LSeg.
Hope this helps!
Best, Boyi
@Boyiliee Thanks so much!