Closed audiofeature closed 8 years ago
Can Pymedia decode MP3, OGG, M4A, AAC... natively? (without ffmpeg, lame or mpg123)
It seems not: http://indashpc.org/vbullettin/viewtopic.php?t=31 IMO it adds just more dependencies, no real advantage over the current solution.
I have successfully tested the example code, BUT:
ffprobe -v quiet -show_streams -of json <input_file>
(which already converts plain text to json, but then the json needs to be parsed).
=> it seems the call of 2 commands + parsing json will kill the potential speedup that we wanted to achieve by decoding directly into memory :-(In fact, not even the piping to memory seems to bring any speedup:
Decoded .mp3 with: ffmpeg -v 1 -y -i music/BoxCat_Games_-_10_-_Epic_Song.mp3 /tmp/ebc262ce-3c88-4ac8-950b-ea5db2836bba.wav 0.291723012924 (2421504, 2) Decoding with: ffmpeg -v 1 -y -i music/BoxCat_Games_-_10_-_Epic_Song.mp3 -f f32le pipe:1 0.301044940948 (4843008,)
(float number is the time in seconds to execute the command)
If i call the command multiple times, the speed up is like 0.253051042557 vs. 0.300421953201. But the ffprobe command is still missing.
I will give up on this for now.
As far as I understand it, the ffmpeg output is a string and decoding that string might be computationally expensive.
No its binary and can be directly passed to a numpy array (see example, i also saw it on std_out). But maybe the pipe mechanism internally dumps to disk or has another reason why its same slow as decoding to disk.
Maybe we should switch to mpg123:
I actually DID use mpg123 ALL THE TIME, had to install FFMPEG freshly now to do this test! In fact mpg123 is faster by 33%! The only advantage of ffmpeg is that it supports maaaany file formats other than mp3.
(Note that our code supports already mpg123 and lame anyway, you have to have only one of the 3 decoders installed)
I extended the timing test now to split the decode_to_memory into two parts: Decoded .mp3 with: ffmpeg -v 1 -y -i music/BoxCatGames-10-_Epic_Song.mp3 /tmp/e23ccfdc-43cc-4df9-bff4-b2b45f1e61e9.wav 0.294055938721 Decoding with: ffmpeg -v 1 -y -i music/BoxCatGames-10-_Epic_Song.mp3 -f f32le pipe:1 0.28061413765 To NP Array: 1.90734863281e-05
So it takes 0.28 instead of 0.29 sec to decode into memory instead of file, and only fractions of microseconds for the np.frombuffer() command to convert it to a Numpy Array!
So my conclusion is its still a problem of the pipe mechanism (which again then is different on Windows).
I give up now.
This is how JKU does it (quite similar):
http://madmom.readthedocs.org/en/latest/_modules/madmom/audio/ffmpeg.html#decode_to_memory
or
The second one uses ffprobe to get the sample_rate and no. of channels (as I had planned/tested it - however without any speedup)
Sebastian Böck replied and confirms that decoding of a 1 minute file is about 23ms + 1 ms to store to disk + 0.5 ms to read. So in total the Disk I/O is 1.5ms which I consider negligible (especially as the addtional ffprobe command would probably consume those 1.5ms). The only reason to decode to memory is to reduce Disk I/O (for other reasons than time? such as battery?)
For now I will not consider this issue as an enhancement and will close it.
will most likely speed up rp_extract_batch.py a lot!
an example of how ffmpeg is piping the data into memory to be transformed into a numpy array (from http://www.ofai.at/~jan.schlueter/code/augment/ ):
def read_ffmpeg(infile, sample_rate, cmd='ffmpeg'): """ Decodes a given audio file using ffmpeg, resampled to a given sample rate, downmixed to mono, and converted to float32 samples. Returns a numpy array. """ call = [cmd, "-v", "quiet", "-i", infile, "-f", "f32le", "-ar", str(sample_rate), "-ac", "1", "pipe:1"] samples = subprocess.check_output(call) return np.frombuffer(samples, dtype=np.float32)