Open jzq2000 opened 1 year ago
Hi, could you please share some caption examples for pretraining on Audioset? I'm a little confused about the [mask] token setting for clip text encod
Hi, could you please share some caption examples for pretraining on Audioset? I'm a little confused about the [mask] token setting for clip text encoder.
Hi, we add the MASK token in advance before input to the diffsound. We will update the scripts to github.
Hi, could you please share some caption examples for pretraining on Audioset? I'm a little confused about the [mask] token setting for clip text encoder.