Model selection for extracting jukebox representations

marypilataki commented 2 years ago

Hello,

I would like to ask a question regarding the language model selection in the main script for extracting jukebox representations. I am referring to the main python script under jukemir/representations/jukebox.

There are two options for the language model, '5b' or '1b_lyrics'. However, when setting parameters there are a couple of if statements referring to a model '5b_lyrics'. Please see the below excerpt of the code from line 153 onwards:

# Set up VQVAE
model = "5b"  # or "1b_lyrics"
hps = Hyperparams()
hps.sr = 44100
hps.n_samples = 3 if model == "5b_lyrics" else 8
hps.name = "samples"
chunk_size = 16 if model == "5b_lyrics" else 32
max_batch_size = 3 if model == "5b_lyrics" else 16

Is there a choice between three different models or "5b" is identical to "5b_lyrics"? Which values did you use for n_samples, chunk_size and max_batch_size when using the pretrained 5B-parameter language model for extracting representations for the datasets you used in the paper?

Thank you!

rodrigo-castellon commented 2 years ago

Hi, So sorry for the very late response, but to answer your question, the "5b_lyrics" model has lyric-conditioning, so we did not use it at all. For n_samples, chunk_size, and max_batch_size, we just used what is given by the code above (i.e. what would happen if the model is not 5b_lyrics).

marypilataki commented 2 years ago

Thank you for the response!

p-lambda / jukemir

Model selection for extracting jukebox representations #5