jiaaro / pydub

Manipulate audio with a simple and easy high level interface
http://pydub.com
MIT License
8.95k stars 1.05k forks source link

Add Support for large audio files ( > 2GB) #135

Open jiaaro opened 8 years ago

jiaaro commented 8 years ago

Until this is built, some users will be able to get away with breaking the audio into chunks like in my comment on #124

Try implementing the StreamingAudioSegment class - which should be an API compatible implementation of AudioSegment (which may use AudioSegment internally?) which provides the same interface/methods, but does not load the complete audio into RAM. If there are significant roadblocks there, perhaps just utilities which do individual memory-intensive operations (not as nice a solution).

Outline of an approach:

The new VeryMemoryConsciousAudioSegment (still workshopping names) could do the audio conversions up front (like they are now) to standard wave data on disk in temp files. All operations on these instances would just pile up in a list until the moment when the actual audio data is needed (like an export, or retrieving info like duration, or loudness).

When the audio data is needed, all pending operations would be applied and the result stored in a new temp file on disk in order to avoid reapplying the operations over and over.

As I think more about this, it seems like this has some downsides (much more disk intensive, harder to do operations that inspect the audio data like getting loudness). I'm becoming more convinced that the current in-memory AudioSegment will need to stick around for some uses even if we get to a completely feature complete Streaming/On-disk implementation.


note: I was originally going to commandeer #124, then #51, and finally settled on adding a new ticket.

Also related: #101

nickmetal commented 7 years ago

Hi @jiaaro! Is any news about StreamingAudioSegment? :) I think it will be very in-demand

jiaaro commented 7 years ago

@nickmetal so far no news - it appears to be a relatively big project and as the transition to 64 bit continues and machines have more RAM, it seems the need for it is slowly going away. For example, I recently loaded an audio segment with ~8 hours of "CD quality" audio. It used a lot of RAM, sure, but it worked.

Can you comment further on what you would use it for?

mission commented 7 years ago

As far as the memory error, switching to a 64bit version of python fixed it. As 32bit has a 4gb limit.

Hopefully this helps :)

kamisori commented 7 years ago

@jiaaro try longer audiofiles. In my case pydub returned sweet nothings after 9 hours and 26 minutes. At one point the RAM usage spiked at 15.9GB but then went down steadily while splitting the file. Perhaps the whole file wasn't fully loaded into memory.

exit99 commented 6 years ago

Any news on this issue?

lcaresia commented 5 years ago

any update ?

tensorfoo commented 5 years ago

Ran into this issue on another project. My audio file is small enough to load fully into memory (1gb mp3) but the caveat is that it is about 70hrs of audio. I have 32gb of ram so it should still be enough. I'd suggest as a compromise to load the entire mp3 in RAM but lazily decode audio chunks as needed. That way it can still fully fit in RAM with minor loss of efficiency.

JoshMayberry commented 4 years ago

I am also interested in this feature.

Razator73 commented 4 years ago

Throwing in my vote to get the feature added.

Wikinaut commented 4 years ago

I noticed an issue that a large mp3 output file (~290 MB) is written apparently correctly, but when I change the export call to produce wav, that output wav file is not written in its entirety (stops at 3.1GB and some of the last songs - my script adds and crossfades mp3 files from a directory - are missing).

File system: Linux ext4; 16 GB RAM

# lets save it!
with open("%s_minute_playlist.wav" % playlist_length, 'wb') as out_f:
    # playlist.export(out_f,format="mp3")
    playlist.export(out_f,format="wav")

Any idea, why?

dyyd commented 4 years ago

Any progress on plans for this feature? Currently am faced with the decision of either adding excessive amounts of RAM to AWS server for the occasional longer length file or just not processing them, which is not really an option.

jiaaro commented 4 years ago

@dyyd would you mind sharing which AudioSegment methods you call? and what audio formats you use?

Perhaps it would be possible to start with a "StreamingAudioSegment" that offers a subset of the current features in AudioSegment

dyyd commented 4 years ago

Resolved our current issue with raw ffmpeg-python usage and piped binary value to AudioSegment constructor. We are cutting sections (small ones compared to full file) out of multi hour long files and reusing them later for combining and reordering etc. As a start it would be nice if it was possible to provide trimming start and end positions to file loading since that is basically the solution we went with currently.

jzohrab commented 2 years ago

I had a similar use case as @dyyd above, and will be using ffmpeg-python for some preprocessing. Perhaps a reference list of "recipes" for people would be helpful -- e.g. I couldn't figure out how to pipe binary from ffmpeg-python to a constructor, so ended up using a temp file.

Code in case it helps anyone:

import ffmpeg
import pydub
from tempfile import NamedTemporaryFile

def audiosegment_from_mp3_time_range(path_to_mp3, starttime_s, duration_s):
    seg = None
    with NamedTemporaryFile("w+b", suffix=".mp3") as f:
        ffmpeg_cmd = (
            ffmpeg
            .input(path_to_mp3, ss=starttime_s, t=duration_s)
            .output(f.name, acodec='copy')
            .overwrite_output()
        )
        ffmpeg_cmd.run()
        # print(f'wrote {f.name}')
        seg = pydub.AudioSegment.from_mp3(f.name)
    return seg
inatuwe commented 1 year ago

I had the same idea of trimming start and end positions with:

AudioSegment.from_file(
    file = "video.mp4",
    start_seconds = 7000,
    duration = 100,
)

But this took surprisingly long!

From analysing the command line I realized, that AudioSegment.from_file() was first specifying the input file and then the seek parameters:

'ffmpeg', '-y', '-i', 'video.mp4', '-ss', '7000', '-t', '100', ...

But when reading about the seek parameter for FFMPEG I understood from "https://trac.ffmpeg.org/wiki/Seeking", that actually: As of FFmpeg 2.1, when transcoding with ffmpeg (i.e. not stream copying), -ss is now also "frame-accurate" even when used as an input option...

So I tried instead the following command:

'ffmpeg', '-y', '-ss', '7000', '-t', '100', '-i', 'video.mp4', ...

... and the video file was processed incredibly much faster!

So I modified audio_segment.py and created a pull request here: https://github.com/jiaaro/pydub/pull/729

inatuwe commented 1 year ago

I have now made another modification to audio_segment.py: I added a pull request https://github.com/jiaaro/pydub/pull/734, which in case the AudioSegment is bigger than 4 GB, exports into separate temporary WAV files < 4 GB and then combines the separate files with FFMPEG into the final compressed format. In my opinion, this solves the issue of exporting large audio files.

cvillela commented 11 months ago

I have now made another modification to audio_segment.py: I added a pull request #734, which in case the AudioSegment is bigger than 4 GB, exports into separate temporary WAV files < 4 GB and then combines the separate files with FFMPEG into the final compressed format. In my opinion, this solves the issue of exporting large audio files.

I am facing issues exporting large audio files nonetheless. I am working with Ambisonics 4 channels files, ~4GB in size. Loading and preprocessing is fine (no memory limitation). Nonetheless I am getting:

error: 'L' format requires 0 <= number <= 4294967295

In wave.py. Any help? @jiaaro pinging you here to see if there is any progress on StreamingAudioSegment.

cvillela commented 11 months ago

For now will keep saving chunks and joining them, or use soundfile.write() which handles this!