Closed Hit1ron closed 1 year ago
yea, you are right! this one is a bit tricky if the sampled audio is of different lengths
let me think about it before executing; should be able to knock out this issue by week's end
@Hit1ron do you want to see if https://github.com/lucidrains/audiolm-pytorch/commit/896b240757a68b107964e93a6c8b7943ec819ad3 fixes the issue? i'll address variable lengthed prompts at a future date
@lucidrains yes, the issue is fixed. I have a suggestion that the init_coarse_time_step and init_fine_time_step parameter should be placed before coarse_token_ids' rearrange and fine_tokens_ids' rearrange, so that setting the max_time_steps parameter is easier, no need to consider the number of coarse and fine quantizers.
@Hit1ron oh yea, that is problematic
i've corrected the initial fine acoustic token timestep, and just opted to set the initial coarse acoustic token timestep to 0 and let the network decide when to eos
In the generation of the coarse transformer and the fine transformer, the acoustic tokens of prime_wav are not used, but are used in the generating continuation mode in the paper.