magenta / ddsp

DDSP: Differentiable Digital Signal Processing
https://magenta.tensorflow.org/ddsp
Apache License 2.0
2.92k stars 341 forks source link

Adding new features #142

Closed barisbozkurt closed 4 years ago

barisbozkurt commented 4 years ago

Dear authors, thanks a lot for sharing this great tool.

I am struggling to use additional features for conditioning the f0-decoder only architecture (solo_instrument.gin). For that, I edit prepare_tfrecord.py to:

Reading the tfrecords file created I see that my feature is not contained but the features in the original implementation only. Any suggestions where to dig to find where it gets lost? I know my feature is computed and its is in the 'ex' dictionary (added a few print statements to observe/check that) but it seems not to be stored in the tfrecords file.

Thanks in advance.

adarob commented 4 years ago

Do you have the code somewhere that I can have a look?

barisbozkurt commented 4 years ago

Here is my modified prepare_tfrecord.py: https://drive.google.com/file/d/1TmMjI5teWl5NtAYAE3mYPi22pd1YfMbs/view?usp=sharing

I have marked places of modification with '#MODIF:'. I am trying to experiment with singing synthesis and be able to control the vowel with a parameter. My train files are a collection of singing with a single vowel indicated in the filename.

Thanks a lot for your time.

adarob commented 4 years ago

I don't see any issues with your code. Did you do a pip install -e . to deploy your local code?

barisbozkurt commented 4 years ago

Thanks Adam, I found my error: that was in reading the tfrecords file, not in writing the file.

I was assuming the existing file-read functions(in ddsp.training.data.py) would return me whatever key-value pair exists in data. I was using the following lines of code for reading: data_provider = ddsp.training.data.TFRecordProvider(TFRECORD_FILEPATTERN) dataset = data_provider.get_batch(batch_size=1, shuffle=False) dataset_iter = iter(dataset) print(dataset_iter.element_spec)

Used as is, that reads specific parts of the dictionary, not all. Modifying that function, I could access my new feature. Now I need to figure out how that gets piped into the process and could be fed into the decoder.

I am struggling (others in the audience may be in a similar case) to figure out how the process pipeline works with all these components of apache-beam, tfrecords, definitions with gin,etc. ... to the level I can modify the architecture, use additional/different input features. I'll continue struggling :)

barisbozkurt commented 4 years ago

Hi Adam, I have done some experiments with adding new one dimensional features and was happy to see I could effectively use them. For example I could use formant values (F1 and F2 as two separate one-dimensional features) to control the vowel space.

Now I am trying to use a 2-dimensional feature which triggers an error in prepare_tfrecord.float_dict_to_tfexample() at the conversion to float list step.

Let's say you have f0_hz and loudness_db features of shape (125,) and you would like to add one more feature of size (125, 10) . What would you change in code to make it work?

I have put the files I have modified to add a new two-dimensional feature here. It contains a file listing the modifications I did on ddsp scripts.

Thanks!