torchaudio need to save segmented chunks as opus like here: torchaudio.save( ..format = 'opus',..

npovey commented 6 months ago

🚀 The feature

I would want to save the file as opus like this: Example: torchaudio.save( os.path.join(args.dst_dir, segment_path), waveform.unsqueeze(0), sample_rate = 16000, format = 'opus', <----currently can not do it )

Motivation, pitch

I want to extract subsegments like here: https://github.com/SpeechColab/GigaSpeech/blob/main/utils/extract_subset_segments.py The code above saves segments as wav file but I would like to save it as opus.

Alternatives

May be use ffmpeg -i BIG_FILE -acodec copy -ss START_TIME -to END_TIME LITTLE_FILE from here:https://unix.stackexchange.com/questions/280767/how-do-i-split-an-audio-file-into-multiple

Additional context

No response

mthrok commented 6 months ago

What error are you seeing?

I just tried it on Google Colab and it seems to work fine with ffmpeg backend.

Screenshot 2023-12-15 at 9 39 41 AM

npovey commented 6 months ago

Hi, Thanks, it worked for me after updating torch from 1.12 to 2.1. But when am using info : ffmpeg -i example.opus it is giving me "72 kb/s"
Question: bits_per_sample = 16 flag is not working.

                segment_path = os.path.join('audio', aid, f'{sid}.opus')
                os.makedirs(os.path.join(args.dst_dir, os.path.dirname(segment_path)), exist_ok = True)
                torchaudio.save(
                    os.path.join(args.dst_dir, segment_path),
                    waveform.unsqueeze(0),
                    sample_rate =  16000,
                    bits_per_sample = 16,
                    encoding="PCM_S",
                )

ffmpeg version 4.2.7-0ubuntu0.1 Copyright (c) 2000-2022 the FFmpeg developers
  built with gcc 9 (Ubuntu 9.4.0-1ubuntu1~20.04.1)
  configuration: --prefix=/usr --extra-version=0ubuntu0.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-avresample --disable-filter=resample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librsvg --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-nvenc --enable-chromaprint --enable-frei0r --enable-libx264 --enable-shared
  libavutil      56. 31.100 / 56. 31.100
  libavcodec     58. 54.100 / 58. 54.100
  libavformat    58. 29.100 / 58. 29.100
  libavdevice    58.  8.100 / 58.  8.100
  libavfilter     7. 57.100 /  7. 57.100
  libavresample   4.  0.  0 /  4.  0.  0
  libswscale      5.  5.100 /  5.  5.100
  libswresample   3.  5.100 /  3.  5.100
  libpostproc    55.  5.100 / 55.  5.100
Input #0, ogg, from 'example.opus':
  Duration: 00:00:06.73, start: 0.000000, bitrate: 72 kb/s
    Stream #0:0: Audio: opus, 48000 Hz, mono, fltp
    Metadata:
      encoder         : Lavf58.29.100
At least one output file must be specified

pytorch / audio