Closed toannguyen1904 closed 1 year ago
Dear @toannguyen1904,
In DeViSE they compare mapped word embeddings with embeddings of instances (e.g. the feature representation of an image) by a similar metric to find the best matching class for this instance.
In contrast, generative approaches, which are also the basis of our work, try to learn a generative model conditioned on the word embeddings. So they are able to then generate instances of embeddings of the unseen classes, which are then used to train the classifier in a supervised manner with these generated instances.
With best regards, Bjoern
Dear @BjoernMichele, Thanks for your quick and insightful answer. But I actually mean the Devise-3DSeg model, cited in your paper as this paper. To my understanding, you adapt this method to your task on 3D point clouds. But as far as I understand, this method (Devise-3DSeg) is a generative approach, so I wonder what are the main differences between your approach and this adaptation.
Dear @toannguyen1904 ,
ah I see, it is probably a misunderstanding, probably this line makes it clear: "DeViSe-3DSeg is an adaptation of Devise-Seg to 3D point clouds, Devise-Seg being itself an adaptation of DeViSe [22] from classification to segmentation, as proposed in [11]."
[11] is indeed a generative framework, and especially the one from which we also started in our research. In their work they introduce and adapt [22] as a baseline for semantic segmentation which however we adapt for the 3D data, therefore Devise-3DSeg.
[11]: Zero-shot semantic segmentation, Bucher et al. [22]: Devise: A deep visual-semantic embedding model, Frome et al.
So your baseline is the adaptation of [22], rather than [11]. Am I correct?
Yes, DeViSe-3DSeg is an adaptation of [22].
However, [22] is introduced as a baseline in [11] for semantic segmentation, so we reference to this usage as a baseline for semantic segmentation too and build up on this. So it is the adaptation of the adaptation of [22], but the underlying methodology is from [22].
Thank you.
I'm sorry, but at this time, I can't see the differences in your approach compared to DeViSe-3DSeg, mentioned as an adaptation of DeViSe-Seg to 3D point clouds. Your framework and theirs seem similar. Could you please spot the differences?