yangdongchao / Text-to-sound-Synthesis

The source code of our paper "Diffsound: discrete diffusion model for text-to-sound generation"
http://dongchaoyang.top/text-to-sound-synthesis-demo/
344 stars 36 forks source link

provide examples with [mask] token? #16

Open jzq2000 opened 1 year ago

jzq2000 commented 1 year ago

Hi, could you please share some caption examples for pretraining on Audioset? I'm a little confused about the [mask] token setting for clip text encoder.

yangdongchao commented 1 year ago

Hi, could you please share some caption examples for pretraining on Audioset? I'm a little confused about the [mask] token setting for clip text encod

yangdongchao commented 1 year ago

Hi, could you please share some caption examples for pretraining on Audioset? I'm a little confused about the [mask] token setting for clip text encoder.

Hi, we add the MASK token in advance before input to the diffsound. We will update the scripts to github.