jcjohnson / neural-style

Torch implementation of neural style algorithm
MIT License
18.31k stars 2.7k forks source link

Neural Audio Style #86

Open 3DTOPO opened 9 years ago

3DTOPO commented 9 years ago

I wonder what optimizations would be needed to feed it some audio. Seems the dimensions of a audio file are fairly different than typical image files.

I imagine the results could be quite bizarre but I think it warrants further investigation!

jcjohnson commented 9 years ago

This ought to be possible if you had access to a powerful pretrained model that could recognize audio; however to the best of my knowledge no such model exists.

3DTOPO commented 9 years ago

That makes sense, thank you. I was thinking I could dump audio to a PNG raster and see what happens - but since its not trained for audio I guess I would expect to only get noise out.

3DTOPO commented 9 years ago

I still can't wonder what it would sound like using a spectral representation of audio as the input - even if it is just a second or two of audio. Seems like it might attempt to match the spectral signatures. Might have to write a little utility to find out. ;)

3DTOPO commented 9 years ago

It actually kind of works. Need more experimenting but using the free demo of Photosounder you can export/import BMPs as audio. I am getting discernible audio out of the neural-net. Nothing real compelling yet though. I just wish Photosounder exported in color (like a spectrum) - then I think the algorithm would do a better job of matching sounds. Anyhow - would be trivial to code.

jcjohnson commented 9 years ago

Wow, that's pretty cool! I'd love to see some examples if you have them.

3DTOPO commented 9 years ago

Thanks and absolutely - here you go: http://glassprinted.com/neuralAudioTest.zip I colorized the waveforms in Photoshop and seemed to help a little. Here is HAL 2000 mixed with a same duration jazz piano rift. Originals attached the result after 1000 iterations is colorAudioTest2_1000.png. The demo of Photosounder won't let me save audio files out so you will have to install it to play them. The original HAL 2000 clip .wav file is in the folder, but since I cropped the piano rift in Photoshop you will have to play it in Photosounder if you want to listen to it.

I think there is a great deal of promise here actually - but an algorithm would have to convert say a whole song in a much more graphically rich representation. Not hard for me to imagine - it would be in essence a music visualizer.

Photosoudner link: http://photosounder.com

3DTOPO commented 9 years ago

Oh yeah - I used the piano rift as the style with default settings - only exception was I only used noise for the seed.

3DTOPO commented 9 years ago

Here is a caffe audio model - well started at least. Not much there. It looks like currently it just can learn spoken numbers, but seems like it could be fun to play with. https://github.com/pannous/caffe-speech-recognition

3DTOPO commented 9 years ago

Guess for something like music I really would want to teach it chords and rhythm and so on. Beyond my budget! :/

rrshaban commented 8 years ago

@3DTOPO could you give a bit more explanation of what each file is in your .zip? It looks like (correct me where I'm mistaken) you've visualized the waveforms of an audio clip and are then applying neural style to extract the style – are you outputting that style (with noise/blank content) and then attempting to map it back to audio? It seems that the mapping back could be difficult.

It seems that an RNN might be better for your use case: https://github.com/karpathy/char-rnn

3DTOPO commented 8 years ago

@rrshaban I converted the "HAL 9000.wav" to a waveform "halColor.png", and a jazz phrase into a waveform "pianoJingleCrop.png". Then fed "halColor.png" as the input image and "pianoJingleCrop.png" in as the style image to neural-style. The resulting file is "colorAudioTest2_1000.png" after 1000 iterations. The demo software that was used to create the waveforms creates a playable .wav file from the resulting "colorAudioTest2_1000.png" file no problem. I can't export that file because I don't want to purchase Photosounder.

Anyhow - my curiosity just got the better of me and thanks for the link - I'll check it out.

ryanpamplin commented 8 years ago

@3DTOPO I downloaded the app and listened to your clips. Really cool out of the box thinking!

3DTOPO commented 8 years ago

Thanks @ryanpamplin

I wonder what it would do to 3D models? E.g. a content 3D model and style 3D model(s).

Or what about 4D for protein folding model/styles? Whoa.

=)

Smife commented 8 years ago

Is it possible to apply deep learning AI to midi files to create automatic compositions in a particular style? It seems to me this may yield some interesting results, although I am not a programmer. What do you think?

htoyryla commented 8 years ago

Smife notifications@github.com kirjoitti 31.1.2016 kello 21.07:

Is it possible to apply deep learning AI to midi files to create automatic compositions in a particular style? It seems to me this may yield some interesting results, although I am not a programmer. What do you think?

I have been thinking of this using RNN (recurrent neural network). Have previously experimented little with algorithmic composition and using n-grams to predict how a melody might continue. The crucial question to me is how to represent the musical information: melody is simple, the temporal element (rhythm) is probably more critical. So I would probably not use midi data as such but convert it to something more suitable.

Char-rnn on github does something similar on text, at character level. I tested it be making it learn the Finnish national epos Kalevala, and it produced quite interesting results, kept very well to the correct meter, for instance.

Hannu

PS. As far as I understand, this email is not a general discussion thread but related to solving issues with neural-style. I feel a more suitable place might be needed for discussions on creative use of neural networks (including but not limited to neural-style). I have been experimenting with neural-style since last September and did also some hacks to the code (such as using two content images), but I have been reluctant to take up them here, because as far as I am concerned neural-style works fine and the things I might want to try, I can manage on my own.

— Reply to this email directly or view it on GitHub.

htoyryla commented 8 years ago

Addenda:

There seems to be quite a lot happening in this area. Just google for ”rnn midi” for instance. One interesting page is http://www.hexahedria.com/2015/08/03/composing-music-with-recurrent-neural-networks/

The idea of using music in abc notation thru char-nn is something I also have been thinking as something easy to try.

Hannu

Hannu Töyrylä hannu.toyryla@gmail.com kirjoitti 1.2.2016 kello 12.13:

Smife notifications@github.com kirjoitti 31.1.2016 kello 21.07:

Is it possible to apply deep learning AI to midi files to create automatic compositions in a particular style? It seems to me this may yield some interesting results, although I am not a programmer. What do you think?

I have been thinking of this using RNN (recurrent neural network). Have previously experimented little with algorithmic composition and using n-grams to predict how a melody might continue. The crucial question to me is how to represent the musical information: melody is simple, the temporal element (rhythm) is probably more critical. So I would probably not use midi data as such but convert it to something more suitable.

Char-rnn on github does something similar on text, at character level. I tested it be making it learn the Finnish national epos Kalevala, and it produced quite interesting results, kept very well to the correct meter, for instance.

Hannu

PS. As far as I understand, this email is not a general discussion thread but related to solving issues with neural-style. I feel a more suitable place might be needed for discussions on creative use of neural networks (including but not limited to neural-style). I have been experimenting with neural-style since last September and did also some hacks to the code (such as using two content images), but I have been reluctant to take up them here, because as far as I am concerned neural-style works fine and the things I might want to try, I can manage on my own.

— Reply to this email directly or view it on GitHub.

3DTOPO commented 8 years ago

Check out WaveNet: it is capable of more natural sounding speech (vs. state of the art) and some music examples at the bottom of the page:

https://deepmind.com/blog/wavenet-generative-model-raw-audio/

iver56 commented 8 years ago

Also check out "Evolving neural networks for cross-adaptive audio effects". It is capable of making one sound similar to another. For example, it can process white noise so that it sounds like a drum loop. Audio examples are available here: http://crossadaptive.hf.ntnu.no/index.php/2016/06/27/evolving-neural-networks-for-cross-adaptive-audio-effects/

The software is available here: https://github.com/iver56/cross-adaptive-audio

rupeshs commented 7 years ago

check out neural song style transfer Transferring the style from one song onto another using artificial intelligence. https://github.com/rupeshs/neuralsongstyle demo https://www.youtube.com/watch?v=iUujo7i6P3w