haihuangcode / CMG

The official implementation of Achieving Cross Modal Generalization with Multimodal Unified Representation (NeurIPS '23)
167 stars 2 forks source link

self.audio_semantic_decoder and self.Audio_decoder #2

Open 1090h2400 opened 10 months ago

1090h2400 commented 10 months ago

https://github.com/haihuangcode/CMG/blob/2cbdad8f68d6000657ddf45ace97c855c022334d/code/src/model/main_model_2.py#L507C1-L515C60

Hi sir! Thanks for your great work! I have some questions I would like to ask you. I don't know if it's right to understand it this way: self.audio_semantic_decoder and self. Audio_decoder are used for classification and feature reconstruction, respectively. I also have a question about whether this work is using a transformer model? because I noticed a UniEncoder.py file

Looking forward to hearing from you!

haihuangcode commented 10 months ago

Hello, thank you for your interest in our work.

The code only uses feature reconstruction in the loss function, classification is there but does not contribute to the loss. This is primarily because we are doing unsupervised pretrain, so we left that part in case we want to extend it later.

We did not actually use a transformer, the UniEncoder code was from an earlier attempt we tried but the results were not very satisfactory. So it was just left in the code but not really used.

Not all released code is useful, some are previous attempts or discarded code, and the core code is in pretrain.py main_model_2.py CPC.py models.py CLUB.py.