valeoai / 3DGenZ

Public repository of the 3DV 2021 paper "Generative Zero-Shot Learning for Semantic Segmentation of 3D Point Clouds"
Other
33 stars 2 forks source link

The differences compared with adaptation of DeViSe-3DSeg #9

Closed toannguyen1904 closed 1 year ago

toannguyen1904 commented 1 year ago

I'm sorry, but at this time, I can't see the differences in your approach compared to DeViSe-3DSeg, mentioned as an adaptation of DeViSe-Seg to 3D point clouds. Your framework and theirs seem similar. Could you please spot the differences?

BjoernMichele commented 1 year ago

Dear @toannguyen1904,

In DeViSE they compare mapped word embeddings with embeddings of instances (e.g. the feature representation of an image) by a similar metric to find the best matching class for this instance.

In contrast, generative approaches, which are also the basis of our work, try to learn a generative model conditioned on the word embeddings. So they are able to then generate instances of embeddings of the unseen classes, which are then used to train the classifier in a supervised manner with these generated instances.

With best regards, Bjoern

toannguyen1904 commented 1 year ago

Dear @BjoernMichele, Thanks for your quick and insightful answer. But I actually mean the Devise-3DSeg model, cited in your paper as this paper. To my understanding, you adapt this method to your task on 3D point clouds. But as far as I understand, this method (Devise-3DSeg) is a generative approach, so I wonder what are the main differences between your approach and this adaptation.

BjoernMichele commented 1 year ago

Dear @toannguyen1904 ,

ah I see, it is probably a misunderstanding, probably this line makes it clear: "DeViSe-3DSeg is an adaptation of Devise-Seg to 3D point clouds, Devise-Seg being itself an adaptation of DeViSe [22] from classification to segmentation, as proposed in [11]."

[11] is indeed a generative framework, and especially the one from which we also started in our research. In their work they introduce and adapt [22] as a baseline for semantic segmentation which however we adapt for the 3D data, therefore Devise-3DSeg.

[11]: Zero-shot semantic segmentation, Bucher et al. [22]: Devise: A deep visual-semantic embedding model, Frome et al.

toannguyen1904 commented 1 year ago

So your baseline is the adaptation of [22], rather than [11]. Am I correct?

BjoernMichele commented 1 year ago

Yes, DeViSe-3DSeg is an adaptation of [22].

However, [22] is introduced as a baseline in [11] for semantic segmentation, so we reference to this usage as a baseline for semantic segmentation too and build up on this. So it is the adaptation of the adaptation of [22], but the underlying methodology is from [22].

toannguyen1904 commented 1 year ago

Thank you.