Add ability to specify input device for ffmpeg_microphone()

Feature request

The function transformers.pipelines.audio_utils.ffmpeg_microphone() currently has the following code for setting the input device for ffmpeg:

def ffmpeg_microphone(
    sampling_rate: int,
    chunk_length_s: float,
    format_for_conversion: str = "f32le",
):
    """
    Helper function to read raw microphone data.
    """
   <....>
    system = platform.system()
    if system == "Linux":
        format_ = "alsa"
        input_ = "default"
    elif system == "Darwin":
        format_ = "avfoundation"
        input_ = ":0"
    elif system == "Windows":
        format_ = "dshow"
        input_ = _get_microphone_name()

This makes it where the only option is to use default ALSA input device. If the user wants to select a different device, there is no way to change this, other than making the system-wide change of default device.

What I would like to see instead is an optional input_device=... parameter added to the ffmpeg_microphone() function that allows the user to specify a different input device.

It would still behave the same as it currently does (use alsa default), if user does nothing, so the change wouldn't break existing code. But if they pass a different input device in function args, they can use it without having to change default input device system-wide.

Motivation

I want to use a different input device than the system default input. I do not want to make the change system-wide.

Your contribution

This would be an extremely simple change to make. Just change the function header from:

def ffmpeg_microphone(
    sampling_rate: int,
    chunk_length_s: float,
    format_for_conversion: str = "f32le",
):

to:

def ffmpeg_microphone(
    sampling_rate: int,
    chunk_length_s: float,
    format_for_conversion: str = "f32le",
    input_device= None,
):

And make the same change to the ffmpeg_microphone_live() function so that different device could be used there as well.

Then, change this part:

    system = platform.system()
    if system == "Linux":
        format_ = "alsa"
        input_ = "default"
    elif system == "Darwin":
        format_ = "avfoundation"
        input_ = ":0"
    elif system == "Windows":
        format_ = "dshow"
        input_ = _get_microphone_name()

    ffmpeg_command = [
        "ffmpeg",
        "-f",
        format_,
        "-i",
        input_,
        "-ac",
        ac,
        "-ar",
        ar,
        "-f",
        format_for_conversion,
        "-fflags",
        "nobuffer",
        "-hide_banner",
        "-loglevel",
        "quiet",
        "pipe:1",
    ]

to use user-supplied format input if provided, and OS specific defaults otherwise:

input_ = input_device

system = platform.system()
if system == "Linux":
    format_ = "alsa"
    if not input_:
        input_ = "default"
elif system == "Darwin":
    format_ = "avfoundation"
    if not input_:
        input_ = ":0"
elif system == "Windows":
    format_ = "dshow"
    if not input_:
        input_ = _get_microphone_name()

ffmpeg_command = [
    "ffmpeg",
    "-f",
    format_,
    "-i",
    input_,
    "-ac",
    ac,
    "-ar",
    ar,
    "-f",
    format_for_conversion,
    "-fflags",
    "nobuffer",
    "-hide_banner",
    "-loglevel",
    "quiet",
    "pipe:1",
]

huggingface / transformers