tuwien-musicir / rp_extract

Rhythm Pattern music feature extractor by IFS @ TU-Vienna
GNU General Public License v3.0
110 stars 27 forks source link

decode mp3 files into memory instead of disk #17

Closed audiofeature closed 8 years ago

audiofeature commented 8 years ago

will most likely speed up rp_extract_batch.py a lot!

an example of how ffmpeg is piping the data into memory to be transformed into a numpy array (from http://www.ofai.at/~jan.schlueter/code/augment/ ):

def read_ffmpeg(infile, sample_rate, cmd='ffmpeg'): """ Decodes a given audio file using ffmpeg, resampled to a given sample rate, downmixed to mono, and converted to float32 samples. Returns a numpy array. """ call = [cmd, "-v", "quiet", "-i", infile, "-f", "f32le", "-ar", str(sample_rate), "-ac", "1", "pipe:1"] samples = subprocess.check_output(call) return np.frombuffer(samples, dtype=np.float32)

slychief commented 8 years ago

Great!!!

Maybe this will also work with PyMedia:

http://pymedia.org/tut/dump_wav.html

audiofeature commented 8 years ago

Can Pymedia decode MP3, OGG, M4A, AAC... natively? (without ffmpeg, lame or mpg123)

audiofeature commented 8 years ago

It seems not: http://indashpc.org/vbullettin/viewtopic.php?t=31 IMO it adds just more dependencies, no real advantage over the current solution.

audiofeature commented 8 years ago

I have successfully tested the example code, BUT:

  1. the pipe stuff seems to be Linux only. Found a Windows hint, but seems to need 2 different implementations: http://stackoverflow.com/questions/32157774/ffmpeg-output-pipeing-to-named-windows-pipe
  2. ffmpeg does not return the sample rate and number of channels. in the example above, it is forced to recode always to a certain sample rate and number of channels. And "f32le" bit format (which makes sense for numpy)
  3. sample rate and number of channels can be derived with ffprobe -v quiet -show_streams -of json <input_file> (which already converts plain text to json, but then the json needs to be parsed). => it seems the call of 2 commands + parsing json will kill the potential speedup that we wanted to achieve by decoding directly into memory :-(
audiofeature commented 8 years ago

In fact, not even the piping to memory seems to bring any speedup: Decoded .mp3 with: ffmpeg -v 1 -y -i music/BoxCat_Games_-_10_-_Epic_Song.mp3 /tmp/ebc262ce-3c88-4ac8-950b-ea5db2836bba.wav 0.291723012924 (2421504, 2) Decoding with: ffmpeg -v 1 -y -i music/BoxCat_Games_-_10_-_Epic_Song.mp3 -f f32le pipe:1 0.301044940948 (4843008,) (float number is the time in seconds to execute the command)

audiofeature commented 8 years ago

If i call the command multiple times, the speed up is like 0.253051042557 vs. 0.300421953201. But the ffprobe command is still missing.

I will give up on this for now.

slychief commented 8 years ago

As far as I understand it, the ffmpeg output is a string and decoding that string might be computationally expensive.

audiofeature commented 8 years ago

No its binary and can be directly passed to a numpy array (see example, i also saw it on std_out). But maybe the pipe mechanism internally dumps to disk or has another reason why its same slow as decoding to disk.

slychief commented 8 years ago

Maybe we should switch to mpg123:

http://multimedia.cx/eggs/gcc-of-multimedia/

https://hydrogenaud.io/index.php/topic,98379.0.html

audiofeature commented 8 years ago

I actually DID use mpg123 ALL THE TIME, had to install FFMPEG freshly now to do this test! In fact mpg123 is faster by 33%! The only advantage of ffmpeg is that it supports maaaany file formats other than mp3.

audiofeature commented 8 years ago

(Note that our code supports already mpg123 and lame anyway, you have to have only one of the 3 decoders installed)

audiofeature commented 8 years ago

I extended the timing test now to split the decode_to_memory into two parts: Decoded .mp3 with: ffmpeg -v 1 -y -i music/BoxCatGames-10-_Epic_Song.mp3 /tmp/e23ccfdc-43cc-4df9-bff4-b2b45f1e61e9.wav 0.294055938721 Decoding with: ffmpeg -v 1 -y -i music/BoxCatGames-10-_Epic_Song.mp3 -f f32le pipe:1 0.28061413765 To NP Array: 1.90734863281e-05

So it takes 0.28 instead of 0.29 sec to decode into memory instead of file, and only fractions of microseconds for the np.frombuffer() command to convert it to a Numpy Array!

So my conclusion is its still a problem of the pipe mechanism (which again then is different on Windows).

I give up now.

audiofeature commented 8 years ago

This is how JKU does it (quite similar):

http://madmom.readthedocs.org/en/latest/_modules/madmom/audio/ffmpeg.html#decode_to_memory

or

http://madmom.readthedocs.org/en/latest/modules/audio/ffmpeg.html#madmom.audio.ffmpeg.load_ffmpeg_file

The second one uses ffprobe to get the sample_rate and no. of channels (as I had planned/tested it - however without any speedup)

audiofeature commented 8 years ago

Sebastian Böck replied and confirms that decoding of a 1 minute file is about 23ms + 1 ms to store to disk + 0.5 ms to read. So in total the Disk I/O is 1.5ms which I consider negligible (especially as the addtional ffprobe command would probably consume those 1.5ms). The only reason to decode to memory is to reduce Disk I/O (for other reasons than time? such as battery?)

For now I will not consider this issue as an enhancement and will close it.