spotify / pedalboard

🎛 🔊 A Python library for audio.
https://spotify.github.io/pedalboard
GNU General Public License v3.0
5.15k stars 260 forks source link

Avoid unnecessarily taking the GIL when using ResampledReadableAudioFile. #293

Closed psobot closed 7 months ago

psobot commented 7 months ago

Similar to #291, this PR adds optimizations to avoid unnecessarily taking the GIL (and unnecessarily making some copies) when reading audio through ResampledReadableAudioFile (a.k.a.: .resampled_to(...)).

Prior to this PR, calling .read() on a ResampledReadableAudioFile would call the .read(...) method on its underlying ReadableAudioFile. This method took out the GIL, forcing all reads of a resampled audio file to be GIL-bound, even though the underlying I/O might not require the GIL at all (i.e.: a memoryview or a raw file handle).

psobot commented 7 months ago

Confirmed that with this change and #291, reading audio files in parallel threads is now much faster due to lower GIL contention:

Testing with Pedalboard 0.8.9
Creating test audio file of 158,760,000 frames at 44.1kHz...
Created test audio file of size 317,520,104 bytes.
Seeking and reading files in 10 threads...
Done reading from <_io.BytesIO object at 0x105794540> in 5.5560 seconds.
Done reading from <_io.BytesIO object at 0x105779c10> in 5.9574 seconds.
Done reading from <_io.BytesIO object at 0x10577b920> in 5.9675 seconds.
Done reading from <_io.BytesIO object at 0x1057944a0> in 5.9752 seconds.
Done reading from <_io.BytesIO object at 0x105794450> in 6.0193 seconds.
Done reading from <_io.BytesIO object at 0x1057944f0> in 6.0307 seconds.
Done reading from <_io.BytesIO object at 0x105794590> in 6.0268 seconds.
Done reading from <_io.BytesIO object at 0x10577a110> in 6.0324 seconds.
Done reading from <_io.BytesIO object at 0x1033d66b0> in 6.0325 seconds.
Done reading from <_io.BytesIO object at 0x1057945e0> in 6.0218 seconds.
Seeking and reading files in one thread...
Done reading from <_io.BytesIO object at 0x1033d66b0> in 0.1178 seconds.
Done reading from <_io.BytesIO object at 0x10577a110> in 0.1169 seconds.
Done reading from <_io.BytesIO object at 0x105779c10> in 0.1148 seconds.
Done reading from <_io.BytesIO object at 0x10577b920> in 0.1150 seconds.
Done reading from <_io.BytesIO object at 0x105794450> in 0.1186 seconds.
Done reading from <_io.BytesIO object at 0x1057944a0> in 0.1162 seconds.
Done reading from <_io.BytesIO object at 0x1057944f0> in 0.1246 seconds.
Done reading from <_io.BytesIO object at 0x105794540> in 0.1185 seconds.
Done reading from <_io.BytesIO object at 0x105794590> in 0.1230 seconds.
Done reading from <_io.BytesIO object at 0x1057945e0> in 0.1217 seconds.
With 10 CPUs, threaded access took 6.08s.
With 10 CPUs, serial access took 1.27s.
With version 0.8.9, threading was 0.21x as fast as serial access.
Testing with Pedalboard 0.9.0
Creating test audio file of 158,760,000 frames at 44.1kHz...
Created test audio file of size 317,520,104 bytes.
Seeking and reading files in 10 threads...
Done reading from <_io.BytesIO object at 0x105a68040> in 0.3087 seconds.
Done reading from <_io.BytesIO object at 0x104f366b0> in 0.2262 seconds.
Done reading from <_io.BytesIO object at 0x106649cb0> in 0.2457 seconds.
Done reading from <_io.BytesIO object at 0x10664a1b0> in 0.2457 seconds.
Done reading from <_io.BytesIO object at 0x10664b9c0> in 0.2545 seconds.
Done reading from <_io.BytesIO object at 0x106664590> in 0.2463 seconds.
Done reading from <_io.BytesIO object at 0x106664630> in 0.2777 seconds.
Done reading from <_io.BytesIO object at 0x1066645e0> in 0.2861 seconds.
Done reading from <_io.BytesIO object at 0x106664680> in 0.2796 seconds.
Done reading from <_io.BytesIO object at 0x1066646d0> in 0.2754 seconds.
Seeking and reading files in one thread...
Done reading from <_io.BytesIO object at 0x105a68040> in 0.1729 seconds.
Done reading from <_io.BytesIO object at 0x104f366b0> in 0.1679 seconds.
Done reading from <_io.BytesIO object at 0x10664a1b0> in 0.1669 seconds.
Done reading from <_io.BytesIO object at 0x106649cb0> in 0.1658 seconds.
Done reading from <_io.BytesIO object at 0x10664b9c0> in 0.1669 seconds.
Done reading from <_io.BytesIO object at 0x106664590> in 0.1887 seconds.
Done reading from <_io.BytesIO object at 0x1066645e0> in 0.1739 seconds.
Done reading from <_io.BytesIO object at 0x106664630> in 0.1932 seconds.
Done reading from <_io.BytesIO object at 0x106664680> in 0.1873 seconds.
Done reading from <_io.BytesIO object at 0x1066646d0> in 0.1725 seconds.
With 10 CPUs, threaded access took 0.79s.
With 10 CPUs, serial access took 1.84s.
With version 0.9.0, threading was 2.34x faster than serial access.
Test script ```python3 import os import time from concurrent.futures import ThreadPoolExecutor from io import BytesIO import numpy as np from pedalboard import Resample, version from pedalboard.io import AudioFile def do_work(_io: BytesIO): with AudioFile(_io, "r").resampled_to(12345.67, Resample.Quality.Linear) as af: a = time.time() af.seek(af.frames) af.seek(0) af.read(af.frames) b = time.time() print(f"Done reading from {_io} in {b -a :.4f} seconds.") def main(): stream = BytesIO() num_frames = 44100 * 60 * 60 print(f"Testing with Pedalboard {version.__version__}") print(f"Creating test audio file of {num_frames:,} frames at 44.1kHz...") with AudioFile(stream, "w", 44100, 1, format="wav") as af: af.write(np.random.rand(num_frames)) print(f"Created test audio file of size {len(stream.getvalue()):,} bytes.") num_cpus = os.cpu_count() ios = [BytesIO(stream.getvalue()) for _ in range(num_cpus)] print(f"Seeking and reading files in {num_cpus} threads...") with ThreadPoolExecutor(num_cpus) as executor: a = time.time() futures = [executor.submit(do_work, _io, num_frames) for _io in ios] for future in futures: future.result() b = time.time() threaded_duration = b - a print("Seeking and reading files in one thread...") a = time.time() for _io in ios: do_work(_io, num_frames) b = time.time() serial_duration = b - a # Threaded access should be faster than serial access. # If the GIL is held when we read from the BytesIO stream, then the threaded # version of this will be 2-3x slower than the serial version. print(f"With {num_cpus} CPUs, threaded access took {threaded_duration:.2f}s.") print(f"With {num_cpus} CPUs, serial access took {serial_duration:.2f}s.") print( f"With version {version.__version__}, threading " f"was {serial_duration / threaded_duration:.2f}x " f"{'faster than' if threaded_duration < serial_duration else 'as fast as'} serial access." ) if __name__ == "__main__": main() ```