Open veqtor opened 8 years ago
@veqtor PR168 does support generic GC as you describe it, in the WaveNetModel. That is, it allows the option of an arbitrary embedding vector as the global condition as an alternative to a category id that is used to lookup the embedding vector. However, train.py does not make use of it; it is rather hard-wired to the VCTK corpus where the global conditioning is always done on the basis of a speaker id which is used to lookup the gc embedding vector.
A suggestion: Having an option of loading in JSON-files with an array of gc_channels amount of floats per audio-file (foo.wav + foo.json) that is then used as global conditioning. That way, one can devise whatever GC-schema is appropriate for the application, this could also work with other type of data-readers. On generation, one would also specify a json-file for conditioning.
Use cases: Music (Wavenet paper specifies using global conditioning with tags and music descriptors to train and generate music of specific genres), Speech generation (map voice descriptors such as formant content and average fundamental to scalars and see if those can be interpolated on generation), Sound experiments (can we use wavenet to emulate specific synthesizers?)