s0d3s / PyAudioWPatch

🐍 PyAudio | PortAudio fork with WASAPI loopback support 🔊 Record audio from speakers on Windows
Other
138 stars 8 forks source link

Extended Recording on Low-Resource Devices #11

Closed Nirvanatin closed 1 year ago

Nirvanatin commented 1 year ago

How can the pawp_simple_recording_app.py example be adjusted to optimize it for extended recording sessions on devices with limited storage and RAM?

s0d3s commented 1 year ago

Hi🖐 You asked a rather abstract question, but I'll try to answer.

To begin with, it is worth saying that in this example, the processing of the audio stream was meant after it was completely recorded. Which in itself is not optimal when it comes to the available resources of the device.

It also uses a "wide enough" format - paInt24, if it is narrowed down to paInt8 - the space required to store audio fragments will decrease.

So to sum up, in order to reduce the memory required for the application to work, you need to:

Required memory table (Uncompressed .WAV/etc)

Format Sample Rate Bit Depth Total memory(MB)
paInt24 44.1 kHz 24 bit 908.43
paInt16 44.1 kHz 16 bit 605.62
paInt8 44.1 kHz 8 bit 302.81

What can I do to reduce the file size of recorded audio?

Nirvanatin commented 1 year ago

Hey there, I appreciate your response. I have modified your example code to immediately process audio fragments and save them as FLAC files. The code has undergone some hasty modifications and may not be reliable in various situations. However, I'm still facing the challenge of accomplishing this without relying on WAV format.

Additionally, I would greatly appreciate any suggestions you may have regarding simultaneously recording two audio inputs. At the moment, I'm dependent on OBS Studio and VB-Audio Virtual Cable to apply RNNoise suppression to my microphone inputs. Could you suggest a simpler solution that can be implemented entirely in Python?

from queue import Queue
import pyaudiowpatch as pyaudio
import wave
import os
import soundfile as sf

filename = "loopback_record_class.wav"
compressed_filename = "loopback_record_class.flac"
data_format = pyaudio.paInt24

class ARException(Exception):
    """Base class for AudioRecorder's exceptions"""
    ...

class WASAPINotFound(ARException):
    ...

class InvalidDevice(ARException):
    ...

class AudioRecorder:
    def __init__(self, p_audio: pyaudio.PyAudio, wave_file: wave.Wave_write):
        self.p = p_audio
        self.wave_file = wave_file
        self.stream = None

    @staticmethod
    def get_default_wasapi_device(p_audio: pyaudio.PyAudio):
        try:  # Get default WASAPI info
            wasapi_info = p_audio.get_host_api_info_by_type(pyaudio.paWASAPI)
        except OSError:
            raise WASAPINotFound("Looks like WASAPI is not available on the system")

        # Get default WASAPI speakers
        sys_default_speakers = p_audio.get_device_info_by_index(wasapi_info["defaultOutputDevice"])

        if not sys_default_speakers["isLoopbackDevice"]:
            for loopback in p_audio.get_loopback_device_info_generator():
                if sys_default_speakers["name"] in loopback["name"]:
                    return loopback
                    break
            else:
                raise InvalidDevice("Default loopback output device not found.\n\nRun `python -m pyaudio` to check available devices")

    def callback(self, in_data, frame_count, time_info, status):
        """Write frames to file immediately and return PA flag"""
        self.wave_file.writeframes(in_data)
        return (None, pyaudio.paContinue)

    def start_recording(self, target_device: dict):
        self.close_stream()

        self.stream = self.p.open(format=data_format,
                                  channels=target_device["maxInputChannels"],
                                  rate=int(target_device["defaultSampleRate"]),
                                  frames_per_buffer=pyaudio.get_sample_size(pyaudio.paInt24),
                                  input=True,
                                  input_device_index=target_device["index"],
                                  stream_callback=self.callback
                                  )

    def stop_stream(self):
        self.stream.stop_stream()

    def start_stream(self):
        self.stream.start_stream()

    def close_stream(self):
        if self.stream is not None:
            self.stream.stop_stream()
            self.stream.close()
            self.stream = None

    @property
    def stream_status(self):
        return "closed" if self.stream is None else "stopped" if self.stream.is_stopped() else "running"

if __name__ == "__main__":
    p = pyaudio.PyAudio()
    ar = None

    help_msg = 30 * "-" + "\n\n\nStatus:\nRunning=%s | Device=%s | output=%s\n\nCommands:\nlist\nrecord {device_index\\default}\npause\ncontinue\nstop {*.wav\\default}\n"
    target_device = None
    wave_file = None

    try:
        while True:
            print(help_msg % (ar.stream_status if ar else "closed", target_device["index"] if target_device else "None", filename))
            com = input("Enter command: ").split()

            if com[0] == "list":
                p.print_detailed_system_info()

            elif com[0] == "record":
                if wave_file:
                    wave_file.close()

                if len(com) > 1 and com[1].isdigit():
                    target_device = p.get_device_info_by_index(int(com[1]))
                else:    
                    try:
                        target_device = AudioRecorder.get_default_wasapi_device(p)
                    except ARException as E:
                        print(f"Something went wrong... {type(E)} = {str(E)[:30]}...\n")
                        continue

                wave_file = wave.open(filename, 'wb')
                wave_file.setnchannels(target_device["maxInputChannels"])
                wave_file.setsampwidth(pyaudio.get_sample_size(data_format))
                wave_file.setframerate(int(target_device["defaultSampleRate"]))

                ar = AudioRecorder(p, wave_file)
                ar.start_recording(target_device)

            elif com[0] == "pause":
                ar.stop_stream()
            elif com[0] == "continue":
                ar.start_stream()
            elif com[0] == "stop":
                ar.close_stream()
                wave_file.close()

                # Compress the recorded audio to FLAC format
                data, _ = sf.read(filename)
                # sf.write(compressed_filename, data, target_device["defaultSampleRate"], format="FLAC")
                sf.write(compressed_filename, data, int(target_device["defaultSampleRate"]), format="FLAC")

                print(f"The audio is written to [{filename}] and compressed to [{compressed_filename}]. Exit...")
                break

            else:
                print(f"[{com[0]}] is an unknown command")

    except KeyboardInterrupt:
        print("\n\nExit without saving...")
    finally:
        if ar:
            ar.close_stream()
        if wave_file:
            wave_file.close()
        p.terminate()
s0d3s commented 1 year ago

⚠ Compressing audio without buffering is not the best idea, because compressing small fragments may not be efficient enough.

But you can do it like this:

import pyaudiowpatch as pyaudio
import soundfile as sf
from typing import Optional

filename = "loopback_record_class.flac"

format_from_pya_2_sf = {
    pyaudio.paInt16: "int16",
    pyaudio.paInt32: "int32",
    # pass
}

data_format = pyaudio.paInt16

if data_format not in format_from_pya_2_sf:
    raise ValueError("Are you sure that SoundFile accepts this format?")

sf_data_format = format_from_pya_2_sf[data_format]

class ARException(Exception):
    """Base class for AudioRecorder's exceptions"""
    ...

class WASAPINotFound(ARException):
    ...

class InvalidDevice(ARException):
    ...

class AudioRecorder:
    def __init__(self, p_audio: pyaudio.PyAudio, output_file_name: str):
        self.p = p_audio
        self.output_file_name = output_file_name
        self.stream = None # type: Optional[pyaudio.Stream]
        self.output_sf = None # type: Optional[sf.SoundFile]

    @staticmethod
    def get_default_wasapi_device(p_audio: pyaudio.PyAudio):
        try:  # Get default WASAPI info
            wasapi_info = p_audio.get_host_api_info_by_type(pyaudio.paWASAPI)
        except OSError:
            raise WASAPINotFound("Looks like WASAPI is not available on the system")

        # Get default WASAPI speakers
        sys_default_speakers = p_audio.get_device_info_by_index(wasapi_info["defaultOutputDevice"])

        if not sys_default_speakers["isLoopbackDevice"]:
            for loopback in p_audio.get_loopback_device_info_generator():
                if sys_default_speakers["name"] in loopback["name"]:
                    return loopback

            else:
                raise InvalidDevice(
                    "Default loopback output device not found.\n\n"
                    "Run `python -m pyaudio` to check available devices"
                )

    def callback(self, in_data, frame_count, time_info, status):
        """Write frames to file immediately and return PA flag"""
        self.output_sf.buffer_write(in_data, sf_data_format)
        return in_data, pyaudio.paContinue

    def start_recording(self, target_device: dict, output_file_name: Optional[str] = None):
        self.close_stream()

        sample_rate = int(target_device["defaultSampleRate"])

        self.output_sf = sf.SoundFile(
            output_file_name or self.output_file_name,
            mode="w",
            format="FLAC",
            channels=target_device["maxInputChannels"],
            samplerate=sample_rate,
        )

        self.stream = self.p.open(
            format=data_format,
            channels=target_device["maxInputChannels"],
            rate=sample_rate,
            frames_per_buffer=pyaudio.get_sample_size(data_format),
            input=True,
            input_device_index=target_device["index"],
            stream_callback=self.callback
        )

    def stop_stream(self):
        self.stream.stop_stream()

    def start_stream(self):
        self.stream.start_stream()

    def close_stream(self):
        if self.stream is not None:
            self.stream.stop_stream()
            self.stream.close()
            self.stream = None
            self.output_sf.close()

    @property
    def stream_status(self):
        return "closed" if self.stream is None else "stopped" if self.stream.is_stopped() else "running"

if __name__ == "__main__":
    p = pyaudio.PyAudio()
    ar = None

    help_msg = 30 * "-" + "\n\n\nStatus:\nRunning=%s | Device=%s | output=%s\n\nCommands:\nlist\nrecord {device_index\\default}\npause\ncontinue\nstop\n"
    target_device = None

    try:
        while True:
            print(
                help_msg % (
                    ar.stream_status
                    if ar else "closed", target_device["index"]
                    if target_device else "None", filename
                )
            )
            com = input("Enter command: ").split()

            if com[0] == "list":
                p.print_detailed_system_info()

            elif com[0] == "record":

                if len(com) > 1 and com[1].isdigit():
                    target_device = p.get_device_info_by_index(int(com[1]))
                else:
                    try:
                        target_device = AudioRecorder.get_default_wasapi_device(p)
                    except ARException as E:
                        print(f"Something went wrong... {type(E)} = {str(E)[:30]}...\n")
                        continue

                ar = AudioRecorder(p, filename)
                ar.start_recording(target_device)

            elif com[0] == "pause":
                ar.stop_stream()
            elif com[0] == "continue":
                ar.start_stream()
            elif com[0] == "stop":
                ar.close_stream()
                print(f"The audio is written to [{filename}]. Exit...")
                break

            else:
                print(f"[{com[0]}] is an unknown command")

    except KeyboardInterrupt:
        print("\n\nExit without saving...")
    finally:
        if ar:
            ar.close_stream()
        p.terminate()

Also, I wouldn't install a soundfile just to use flac. I'd rather choose pyflac, but it's up to you.

s0d3s commented 1 year ago

Now about your second question. RNNoise is a neural network implemented in C, it is not part of the OBS (it is integrated via a plugin). So you can make a python wrapper around the C code and use it directly. Probably someone has already implemented a similar wrapper.

To record from several sources at once, you just need to use the second pyaudio.Stream instance. When using a callback, there should be no problems. But with direct reading, the use of threads will probably be relevant.

Your questions are not related to this fork. It would be more appropriate to publish them on stackoverflow. So if you don't have any questions regarding pyaudiowpatch, it would be appropriate to close the issue.