Using WhisperSpeech Pre-trained Weights for TextToSemantic

I believe that whisperspeech uses Spear-TTS.

I want to use the pre-trained weights from the above huggingface link, but I don't know how exactly.

The config keys for t2s models are as follows ["depth", "n_head", "head_width", "ffn_mult", "stoks_width", "ttoks_width", "ttoks_len", "stoks_len", "ttoks_codes", "stoks_codes"]

However, I find the variables for TextToSemantic are slightly different, which makes it confusing if it is okay to use them.

Can anybody help me with this issue?

I first wanted to solve this in the discussions page, but the page seems inactive, so I apologize in advance for uploading this here.

lucidrains / voicebox-pytorch

Using WhisperSpeech Pre-trained Weights for TextToSemantic #49