Wouldn't it be nice to program your synthesizers by providing examples
instead of endlessly tweaking knobs? That is what the Sympler would
attempt to do, by learning how to tweak the knobs for you. It is fair
to say, the Sympler would make your life simpler.
Background
Programming synthesizers is notoriously difficult. Tutorials like
Syntorial by Joe Hanley or books like Designing Sound by Andy Farnell
exist for that reason. At heart, it is hard because it is an inverse
problem, finding an input producing a desired output given a function
mapping input to output. Here the input is a collection of parameter
settings, the output is a sound and the function is a synthesizer. On
top of that, the notion of "desired output" in this context is tied to
psychoacoustics and ultimately artistic judgment, which adds another
level of difficulty.
Learning tasks
There are at least 2 ways to approach this problem with artificial
intelligence.
Unsupervised learning
Provide a sound, a measure of sound similarity, and let the learning
algorithm search the space of parameters to minimize the distance
between the provided and the synthesized sounds.
This requires to define a measure of sound similarity. They are many
ways to do that and it is an ongoing subject of research, however a
popular way seems to be via measuring the distance between the
Mel-Frequency Ceptrum Coefficients (MFCC)
https://en.wikipedia.org/wiki/Mel-frequency_cepstrum of the target and
the synthesized sounds. The MFCC seems well suited to characterize the
timbre of vocal and musical sounds in accordance to psychoacoustics.
to a learning algorithm to produce a model mapping sound to
parameter-settings.
The corpus could be generated by randomly setting, in a constrained
manner, the parameters of a given synthesizer to obtain
parameter-settings-i, and recording the synthesized sound to obtain
sound-i.
Then, a new sound could be input to such a model to obtain the
parameters that would hopeful make the synthesizer imitate that new
sound. If the result is bad, then the problem could be attempted to be
solved in an unsurpervised manner as described above, and if the
result is good, such pair
(new-sound, new-parameter-settings)
could be added to the corpus for subsequent surpervised learning.
Practicalities
It makes sense to work with a software synthesizer, as opposed to
hardware, as it will be invoked, either directly during unsupervised
learning, or indirectly to generate the corpus for surpervised
learning. A hardware synthesizer could be used as well, but would be
harder to setup and slower to run.
As software synthesizer I would suggest ZynAddSubFX
http://zynaddsubfx.sourceforge.net/ for its rich synthesizer. This
would allow to experiment with different types of synthesis, different
classes of parameters and sounds, and different levels of
difficulties, while maintaining the same experimental infrastructure.
It also supports the Open Sound Control protocol which might make it
easier to interface with.
Or as a simpler alternative I would suggest OPNMIDI, an FM
synthesizer, because there already exists some genetic programming
code evolving patches for it (unsupervised learning) from the same
author, Jean Pierre Cimalando, called FMProg. This simpler
alternative could be a way to build upon FMProg rather than starting
from scratch. However having experimented with it, it does not perform
well, so there is still a lot of room for improvements. It's not clear
to me whether the failure of FMProg is due to a poor genetic algorithm
or a poor sound similarity metric. Probably a bit of both. It uses
MFCC, as suggested in this proposal, if that is an indication that
MFCC is a poor metric then more work would be required to improve
that as well.
Metrics
A precise metric will have to be defined, but a Mel-Frequency Ceptrum
Coefficients (MFCC) based distance between the sounds of the training corpus
and the ones produced by unsupervised and surpervised learning, would
be a good start. If a MFCC-based metric turns out to be a bad one then it
will have to be evaluated otherwise, which might ultimately be rather subjective.
Learning Algorithms
The learning algorithms would be up to the user to choose. From DNN to
SVM, to any existing technique available (maybe even including MOSES an OpenCog program learner).
Author Nil Geisweiller
Description
Wouldn't it be nice to program your synthesizers by providing examples instead of endlessly tweaking knobs? That is what the Sympler would attempt to do, by learning how to tweak the knobs for you. It is fair to say, the Sympler would make your life simpler.
Background
Programming synthesizers is notoriously difficult. Tutorials like Syntorial by Joe Hanley or books like Designing Sound by Andy Farnell exist for that reason. At heart, it is hard because it is an inverse problem, finding an input producing a desired output given a function mapping input to output. Here the input is a collection of parameter settings, the output is a sound and the function is a synthesizer. On top of that, the notion of "desired output" in this context is tied to psychoacoustics and ultimately artistic judgment, which adds another level of difficulty.
Learning tasks
There are at least 2 ways to approach this problem with artificial intelligence.
Provide a sound, a measure of sound similarity, and let the learning algorithm search the space of parameters to minimize the distance between the provided and the synthesized sounds.
This requires to define a measure of sound similarity. They are many ways to do that and it is an ongoing subject of research, however a popular way seems to be via measuring the distance between the Mel-Frequency Ceptrum Coefficients (MFCC) https://en.wikipedia.org/wiki/Mel-frequency_cepstrum of the target and the synthesized sounds. The MFCC seems well suited to characterize the timbre of vocal and musical sounds in accordance to psychoacoustics.
Provide a corpus of pairs
to a learning algorithm to produce a model mapping sound to parameter-settings.
The corpus could be generated by randomly setting, in a constrained manner, the parameters of a given synthesizer to obtain
parameter-settings-i
, and recording the synthesized sound to obtainsound-i
.Then, a new sound could be input to such a model to obtain the parameters that would hopeful make the synthesizer imitate that new sound. If the result is bad, then the problem could be attempted to be solved in an unsurpervised manner as described above, and if the result is good, such pair
could be added to the corpus for subsequent surpervised learning.
Practicalities
It makes sense to work with a software synthesizer, as opposed to hardware, as it will be invoked, either directly during unsupervised learning, or indirectly to generate the corpus for surpervised learning. A hardware synthesizer could be used as well, but would be harder to setup and slower to run.
As software synthesizer I would suggest ZynAddSubFX http://zynaddsubfx.sourceforge.net/ for its rich synthesizer. This would allow to experiment with different types of synthesis, different classes of parameters and sounds, and different levels of difficulties, while maintaining the same experimental infrastructure. It also supports the Open Sound Control protocol which might make it easier to interface with.
Or as a simpler alternative I would suggest OPNMIDI, an FM synthesizer, because there already exists some genetic programming code evolving patches for it (unsupervised learning) from the same author, Jean Pierre Cimalando, called FMProg. This simpler alternative could be a way to build upon FMProg rather than starting from scratch. However having experimented with it, it does not perform well, so there is still a lot of room for improvements. It's not clear to me whether the failure of FMProg is due to a poor genetic algorithm or a poor sound similarity metric. Probably a bit of both. It uses MFCC, as suggested in this proposal, if that is an indication that MFCC is a poor metric then more work would be required to improve that as well.
Metrics
A precise metric will have to be defined, but a Mel-Frequency Ceptrum Coefficients (MFCC) based distance between the sounds of the training corpus and the ones produced by unsupervised and surpervised learning, would be a good start. If a MFCC-based metric turns out to be a bad one then it will have to be evaluated otherwise, which might ultimately be rather subjective.
Learning Algorithms
The learning algorithms would be up to the user to choose. From DNN to SVM, to any existing technique available (maybe even including MOSES an OpenCog program learner).
Non-functional Requirements
Open-source software is mandatory.
Expiration Date
20 December 2020