Open jiaaro opened 8 years ago
Hi @jiaaro! Is any news about StreamingAudioSegment
? :)
I think it will be very in-demand
@nickmetal so far no news - it appears to be a relatively big project and as the transition to 64 bit continues and machines have more RAM, it seems the need for it is slowly going away. For example, I recently loaded an audio segment with ~8 hours of "CD quality" audio. It used a lot of RAM, sure, but it worked.
Can you comment further on what you would use it for?
As far as the memory error, switching to a 64bit version of python fixed it. As 32bit has a 4gb limit.
Hopefully this helps :)
@jiaaro try longer audiofiles. In my case pydub returned sweet nothings after 9 hours and 26 minutes. At one point the RAM usage spiked at 15.9GB but then went down steadily while splitting the file. Perhaps the whole file wasn't fully loaded into memory.
Any news on this issue?
any update ?
Ran into this issue on another project. My audio file is small enough to load fully into memory (1gb mp3) but the caveat is that it is about 70hrs of audio. I have 32gb of ram so it should still be enough. I'd suggest as a compromise to load the entire mp3 in RAM but lazily decode audio chunks as needed. That way it can still fully fit in RAM with minor loss of efficiency.
I am also interested in this feature.
Throwing in my vote to get the feature added.
I noticed an issue that a large mp3 output file (~290 MB) is written apparently correctly, but when I change the export call to produce wav
, that output wav file is not written in its entirety (stops at 3.1GB and some of the last songs - my script adds and crossfades mp3 files from a directory - are missing).
File system: Linux ext4; 16 GB RAM
# lets save it!
with open("%s_minute_playlist.wav" % playlist_length, 'wb') as out_f:
# playlist.export(out_f,format="mp3")
playlist.export(out_f,format="wav")
Any progress on plans for this feature? Currently am faced with the decision of either adding excessive amounts of RAM to AWS server for the occasional longer length file or just not processing them, which is not really an option.
@dyyd would you mind sharing which AudioSegment methods you call? and what audio formats you use?
Perhaps it would be possible to start with a "StreamingAudioSegment" that offers a subset of the current features in AudioSegment
Resolved our current issue with raw ffmpeg-python
usage and piped binary value to AudioSegment constructor.
We are cutting sections (small ones compared to full file) out of multi hour long files and reusing them later for combining and reordering etc.
As a start it would be nice if it was possible to provide trimming start and end positions to file loading since that is basically the solution we went with currently.
I had a similar use case as @dyyd above, and will be using ffmpeg-python for some preprocessing. Perhaps a reference list of "recipes" for people would be helpful -- e.g. I couldn't figure out how to pipe binary from ffmpeg-python to a constructor, so ended up using a temp file.
Code in case it helps anyone:
import ffmpeg
import pydub
from tempfile import NamedTemporaryFile
def audiosegment_from_mp3_time_range(path_to_mp3, starttime_s, duration_s):
seg = None
with NamedTemporaryFile("w+b", suffix=".mp3") as f:
ffmpeg_cmd = (
ffmpeg
.input(path_to_mp3, ss=starttime_s, t=duration_s)
.output(f.name, acodec='copy')
.overwrite_output()
)
ffmpeg_cmd.run()
# print(f'wrote {f.name}')
seg = pydub.AudioSegment.from_mp3(f.name)
return seg
I had the same idea of trimming start and end positions with:
AudioSegment.from_file(
file = "video.mp4",
start_seconds = 7000,
duration = 100,
)
But this took surprisingly long!
From analysing the command line I realized, that AudioSegment.from_file() was first specifying the input file and then the seek parameters:
'ffmpeg', '-y', '-i', 'video.mp4', '-ss', '7000', '-t', '100', ...
But when reading about the seek parameter for FFMPEG I understood from "https://trac.ffmpeg.org/wiki/Seeking", that actually: As of FFmpeg 2.1, when transcoding with ffmpeg (i.e. not stream copying), -ss is now also "frame-accurate" even when used as an input option...
So I tried instead the following command:
'ffmpeg', '-y', '-ss', '7000', '-t', '100', '-i', 'video.mp4', ...
... and the video file was processed incredibly much faster!
So I modified audio_segment.py and created a pull request here: https://github.com/jiaaro/pydub/pull/729
I have now made another modification to audio_segment.py: I added a pull request https://github.com/jiaaro/pydub/pull/734, which in case the AudioSegment is bigger than 4 GB, exports into separate temporary WAV files < 4 GB and then combines the separate files with FFMPEG into the final compressed format. In my opinion, this solves the issue of exporting large audio files.
I have now made another modification to audio_segment.py: I added a pull request #734, which in case the AudioSegment is bigger than 4 GB, exports into separate temporary WAV files < 4 GB and then combines the separate files with FFMPEG into the final compressed format. In my opinion, this solves the issue of exporting large audio files.
I am facing issues exporting large audio files nonetheless. I am working with Ambisonics 4 channels files, ~4GB in size. Loading and preprocessing is fine (no memory limitation). Nonetheless I am getting:
error: 'L' format requires 0 <= number <= 4294967295
In wave.py. Any help? @jiaaro pinging you here to see if there is any progress on StreamingAudioSegment.
For now will keep saving chunks and joining them, or use soundfile.write()
which handles this!
Until this is built, some users will be able to get away with breaking the audio into chunks like in my comment on #124
Try implementing the
StreamingAudioSegment
class - which should be an API compatible implementation ofAudioSegment
(which may useAudioSegment
internally?) which provides the same interface/methods, but does not load the complete audio into RAM. If there are significant roadblocks there, perhaps just utilities which do individual memory-intensive operations (not as nice a solution).Outline of an approach:
The new
VeryMemoryConsciousAudioSegment
(still workshopping names) could do the audio conversions up front (like they are now) to standard wave data on disk in temp files. All operations on these instances would just pile up in a list until the moment when the actual audio data is needed (like an export, or retrieving info like duration, or loudness).When the audio data is needed, all pending operations would be applied and the result stored in a new temp file on disk in order to avoid reapplying the operations over and over.
As I think more about this, it seems like this has some downsides (much more disk intensive, harder to do operations that inspect the audio data like getting loudness). I'm becoming more convinced that the current in-memory AudioSegment will need to stick around for some uses even if we get to a completely feature complete Streaming/On-disk implementation.
note: I was originally going to commandeer #124, then #51, and finally settled on adding a new ticket.
Also related: #101