singnet / rfai-proposal

MIT License
2 stars 0 forks source link

Sympler: Turn your favorite synthesizer into a sampler! #5

Open raamb opened 4 years ago

raamb commented 4 years ago

Author Nil Geisweiller

Description

Wouldn't it be nice to program your synthesizers by providing examples instead of endlessly tweaking knobs? That is what the Sympler would attempt to do, by learning how to tweak the knobs for you. It is fair to say, the Sympler would make your life simpler.

Background

Programming synthesizers is notoriously difficult. Tutorials like Syntorial by Joe Hanley or books like Designing Sound by Andy Farnell exist for that reason. At heart, it is hard because it is an inverse problem, finding an input producing a desired output given a function mapping input to output. Here the input is a collection of parameter settings, the output is a sound and the function is a synthesizer. On top of that, the notion of "desired output" in this context is tied to psychoacoustics and ultimately artistic judgment, which adds another level of difficulty.

Learning tasks

There are at least 2 ways to approach this problem with artificial intelligence.

  1. Unsupervised learning

Provide a sound, a measure of sound similarity, and let the learning algorithm search the space of parameters to minimize the distance between the provided and the synthesized sounds.

This requires to define a measure of sound similarity. They are many ways to do that and it is an ongoing subject of research, however a popular way seems to be via measuring the distance between the Mel-Frequency Ceptrum Coefficients (MFCC) https://en.wikipedia.org/wiki/Mel-frequency_cepstrum of the target and the synthesized sounds. The MFCC seems well suited to characterize the timbre of vocal and musical sounds in accordance to psychoacoustics.

  1. Surpervised learning

Provide a corpus of pairs

(sound-1, parameter-settings-1)
...
(sound-n, parameter-settings-n)

to a learning algorithm to produce a model mapping sound to parameter-settings.

The corpus could be generated by randomly setting, in a constrained manner, the parameters of a given synthesizer to obtain parameter-settings-i, and recording the synthesized sound to obtain sound-i.

Then, a new sound could be input to such a model to obtain the parameters that would hopeful make the synthesizer imitate that new sound. If the result is bad, then the problem could be attempted to be solved in an unsurpervised manner as described above, and if the result is good, such pair

(new-sound, new-parameter-settings)

could be added to the corpus for subsequent surpervised learning.

Practicalities

It makes sense to work with a software synthesizer, as opposed to hardware, as it will be invoked, either directly during unsupervised learning, or indirectly to generate the corpus for surpervised learning. A hardware synthesizer could be used as well, but would be harder to setup and slower to run.

As software synthesizer I would suggest ZynAddSubFX http://zynaddsubfx.sourceforge.net/ for its rich synthesizer. This would allow to experiment with different types of synthesis, different classes of parameters and sounds, and different levels of difficulties, while maintaining the same experimental infrastructure. It also supports the Open Sound Control protocol which might make it easier to interface with.

Or as a simpler alternative I would suggest OPNMIDI, an FM synthesizer, because there already exists some genetic programming code evolving patches for it (unsupervised learning) from the same author, Jean Pierre Cimalando, called FMProg. This simpler alternative could be a way to build upon FMProg rather than starting from scratch. However having experimented with it, it does not perform well, so there is still a lot of room for improvements. It's not clear to me whether the failure of FMProg is due to a poor genetic algorithm or a poor sound similarity metric. Probably a bit of both. It uses MFCC, as suggested in this proposal, if that is an indication that MFCC is a poor metric then more work would be required to improve that as well.

Metrics

A precise metric will have to be defined, but a Mel-Frequency Ceptrum Coefficients (MFCC) based distance between the sounds of the training corpus and the ones produced by unsupervised and surpervised learning, would be a good start. If a MFCC-based metric turns out to be a bad one then it will have to be evaluated otherwise, which might ultimately be rather subjective.

Learning Algorithms

The learning algorithms would be up to the user to choose. From DNN to SVM, to any existing technique available (maybe even including MOSES an OpenCog program learner).

Non-functional Requirements

Open-source software is mandatory.

Expiration Date

20 December 2020