Closed sphmel closed 1 month ago
Hi, @seaplus296 you are right, post_adapter is the tts_adapter. But the open source version do not include tts_adapter.
@mini-omni Then, is paper version has 30-layer transformer for tts-output? which means do not sharing lm head between code vocab and lm vocab?
By the way, I think it would be good to notify that model here is not reported in technical report if it's different....
@mini-omni Then, is paper version has 30-layer transformer for tts-output? which means do not sharing lm head between code vocab and lm vocab?
yes, we add extra 6-layer for tts-adapter, and they do not share lm_head (even for the version without tts-adapter)
By the way, I think it would be good to notify that model here is not reported in technical report if it's different....
thanks for your suggestion, I'll add it in the README.
@mini-omni Thanks. Finally is there any performance difference between report's architecture and open-source version? Why architecture changed?
@mini-omni Thanks. Finally is there any performance difference between report's architecture and open-source version? Why architecture changed?
The open-source version is the one used for synchronous development and experimental comparison. In our subjective evaluation, the version with the TTS-adapter has less impact on text processing capabilities.
@mini-omni Thank you for kind replies.
@mini-omni sorry, i have some follow-up question. If open-source version's tts adapter is just vocab expansion how it is trained in stage 1? only expanded parameters?
in technical report, architecture of TTS Adapter is additional 6 transformer blocks,
but in config in link, can not see such architecture. maybe post_adapter?
but no post_adapter in config and checkpoint.