salu133445 / musegan

An AI for Music Generation
https://salu133445.github.io/musegan/
MIT License
1.81k stars 367 forks source link

How to convert wav into midi? #59

Closed techcentaur closed 5 years ago

techcentaur commented 5 years ago

I've been trying to convert wav file into midi file using the software WaoN, waves to notes transcriber. But the problem is, irrespective of what instruments are in the .wav file, the .mid always have a single instrumental piano and hence one track. I need to get multitrack output, by which I mean, tracks of piano, drums, and like these.

Using a python module pypianoroll -

For a song that is already in midi, here, let it be - "Bohemian Rhapsody", I get output something like this:

>>> pypianoroll.Multitrack('Bohemian_Rhapsody.mid')
Multitrack(tracks=['', '', '', '', '', '', '', '', '', '', '', '', ''], tempo=array([78.000078, 78.000078, 78.000078, ..., 78.000078, 78.000078,
   78.000078]), downbeat=array([ True, False, False, ..., False, False, False]), beat_resolution=24, name=unknown)

It shows, it has 13 different track in it.

But when I convert a wav of this file, into midi using WaoN, there it shows me

>>> pypianoroll.Multitrack('Bohemian_Rhapsody.mid')
Multitrack(tracks=[''], tempo=array([120., 120., 120., ..., 120., 120., 120.]), downbeat=None, beat_resolution=24, name=unknown)

With a single track.

This happens every time, with every file; when I get a result in multitrack.

I can use some help. Can anyone point out what's wrong? Or give a suggestion on how to proceed?

salu133445 commented 5 years ago

I guess WaoN only detects the notes and does not take instruments into account. The 'piano' program is just its default instrument.

Transcribing a raw audio mixture into a multitrack pianoroll is another task, which is far more challenging. The only feasible approach I can come up with so far is to first perform a source separation algorithm and then run single-instrument transcription algorithms on each separated track.

techcentaur commented 5 years ago

Alright, that is true. One more thing, that I wanted to ask:

I was looking at the lakh midi dataset, and I tried seeing the track names in the midi files. By seeing the

roll = pypianoroll.Multitrack('file.mid') [track.name for track in roll.tracks]

Some results I got are - [*] Filename: /Music/lmd_matched/S/S/S/TRSSSJW128F146D5DC/5775cbedfbd4a93993bd333e33527a42.mid [.] N-tracks = 8 [.] pianoroll shape (7008, 128, 8)

Percussion Track 4 Track 1 Track 2 Track 2 Track 6 Track 3 Track 3

[*] Filename: /Music/lmd_matched/S/S/S/TRSSSEF128F14A2B2E/3d1f10dc4d05156188d2dce330b110b7.mid [.] N-tracks = 13 [.] pianoroll shape (9696, 128, 13)

Bassdrum Snare
Hihat
Ride
Cymbals
Toms
Tambourin Clean Gtr Fing.Bass

Funk Gtr

Vocal-Lin Banjo
Jazz Gtr.

[*] Filename: /Music/lmd_matched/S/S/S/TRSSSEF128F14A2B2E/aca505a03c1246dbea964b206b4c66af.mid [.] N-tracks = 9 [.] pianoroll shape (12384, 128, 9)

LISTEN TO THE MUSIC ;Words and music by Tom Johnston LISTEN TO THE MUSIC ;Words and music by Tom Johnston LISTEN TO THE MUSIC ;Words and music by Tom Johnston LISTEN TO THE MUSIC ;Words and music by Tom Johnston LISTEN TO THE MUSIC ;Words and music by Tom Johnston LISTEN TO THE MUSIC ;Words and music by Tom Johnston LISTEN TO THE MUSIC ;Words and music by Tom Johnston LISTEN TO THE MUSIC ;Words and music by Tom Johnston LISTEN TO THE MUSIC ;Words and music by Tom Johnston

So the track names are noisy.

I wanted to ask how to process this type of data? Because as I can see in the code in musegan the tracks are 5; and they must correpsond to some instruments like Drums, Piano,..in some order. I want to know how to process such noisy files into the type of data that musegan uses. Please help!

salu133445 commented 5 years ago

The track names can be noisy as they are plain text and do not impact the playback. MIDI creators can set whatever values they like or simply leave it empty.

We instead used the program number to acquire the program/instrument information. The program number (0-127) is defined in the MIDI specification to assign a certain track with the corresponding program/instrument to use during playback. The program number is usually more accurate as it greatly influence the playback. You can simply access the program number by track.program with Pypianoroll.

Most MIDI files use the General MIDI 1 Sound Set as the matching table for program numbers and programs/instruments. Please see here for more information.

techcentaur commented 5 years ago

The link perfectly explains what you said. Still if I see the different instruments that were in the tracks.program in any of the files. The variation is too much. The tracks change too frequently, sometimes its [Guitar, Piano]; sometimes it is [Guitar, Guitar, Bass, Organ]; sometimes it is [Percussion]; sometimes it is empty.

Nevertheless, when I want to input data into musegan, I think that each program in the last track of array (like 5 in (2291 X 6 X 96 X 84 X 5)) must have a consist musical instrument (or program) in all files.

I am thinking of choosing the 5 most frequently used instruments, and then select only that file which has those.

What do you think about it? I wanted to know how did you deal with situation like this whilst coding musegan: more specifically, the pre-processing you did on such midi files? Can you share code or files for pre-processing of midi files to formation of .npy array that you did?

salu133445 commented 5 years ago

Here is an example for making the LPD-5 dataset. You might find something helpful.