serenity-rs / songbird

An async Rust library for the Discord voice API
ISC License
381 stars 108 forks source link

Passing in `Opus` encoded audio to `play_source` for passthrough to discord only works if the audio is under 1432 bytes. #194

Closed Pranav-Badrinathan closed 1 year ago

Pranav-Badrinathan commented 1 year ago

Songbird version: 0.3.2

Rust version (rustc -V): 1.70.0

Serenity/Twilight version: Serenity 0.11.5

Output of ffmpeg -version, yt-dlp --version (if relevant): Not relavant

Description:

This could be a bug, or intended. When passing in Opus frames with Container::Dca to create an Input (relevant discussion) and passing it over to play_source to be played, the playback thread always panics if the frame size is over 1432 bytes exactly. This seems to have something to do with VOICE_PACKET_MAX, which is used in Mixer::new to initialize an array of zeroes.

The confusing part is, when debugging, I saw that the code panicked here, where buffer was a slice with only zeroes and length 1432 (NOT the data used to initialize Reader), but frame.frame_len was the length of the frame sent into play_source. This possibly looks like the array initialized with VOICE_PACKET_MAX that I mentioned above, but with a length value equal to the data passed into play_source.

Backtrace, if it is helpful. ``` thread '' panicked at 'range end index 3269 out of range for slice of length 1432', C:\Users\pranav\.cargo\registry\src\index.crates.io-6f17d22bba15001f\songbird-0.3.2\src\input\mod.rs:320:34 stack backtrace: 0: std::panicking::begin_panic_handler at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library\std\src\panicking.rs:578 1: core::panicking::panic_fmt at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library\core\src\panicking.rs:67 2: core::slice::index::slice_end_index_len_fail_rt at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library\core\src\slice\index.rs:78 3: core::slice::index::slice_end_index_len_fail at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library\core\src\slice\index.rs:70 4: core::slice::index::impl$4::index_mut at /rustc/90c541806f23a127002de5b4038be731ba1458ca\library\core\src\slice\index.rs:422 5: core::slice::index::impl$5::index_mut at /rustc/90c541806f23a127002de5b4038be731ba1458ca\library\core\src\slice\index.rs:463 6: core::slice::index::impl$1::index_mut > at /rustc/90c541806f23a127002de5b4038be731ba1458ca\library\core\src\slice\index.rs:31 7: songbird::input::Input::read_opus_frame at C:\Users\prana.LAPTOP-MODAKBGP\.cargo\registry\src\index.crates.io-6f17d22bba15001f\songbird-0.3.2\src\input\mod.rs:320 8: songbird::driver::tasks::mixer::mix_tracks at C:\Users\prana.LAPTOP-MODAKBGP\.cargo\registry\src\index.crates.io-6f17d22bba15001f\songbird-0.3.2\src\driver\tasks\mixer.rs:573 9: songbird::driver::tasks::mixer::Mixer::cycle at C:\Users\prana.LAPTOP-MODAKBGP\.cargo\registry\src\index.crates.io-6f17d22bba15001f\songbird-0.3.2\src\driver\tasks\mixer.rs:420 10: songbird::driver::tasks::mixer::Mixer::run at C:\Users\prana.LAPTOP-MODAKBGP\.cargo\registry\src\index.crates.io-6f17d22bba15001f\songbird-0.3.2\src\driver\tasks\mixer.rs:132 11: songbird::driver::tasks::mixer::runner at C:\Users\prana.LAPTOP-MODAKBGP\.cargo\registry\src\index.crates.io-6f17d22bba15001f\songbird-0.3.2\src\driver\tasks\mixer.rs:625 12: songbird::driver::tasks::start_internals::closure$1 at C:\Users\prana.LAPTOP-MODAKBGP\.cargo\registry\src\index.crates.io-6f17d22bba15001f\songbird-0.3.2\src\driver\tasks\mod.rs:58 ```

Steps to reproduce:

  1. Encode some audio in Opus.
  2. Add Dca spec required frame lengths at the start of each frame. Make sure that each frame is bigger than 1432 bytes in size.
  3. Create an Input using
    Input::new(
    stereo:    true,
    reader:    Reader::from_memory(buf: Vec<u8>), // Where buf is greater than 1432 in size.
    codec:     Codec::Opus(OpusDecoderState::new()::unwrap()),
    container: Container::Dca { first_frame: 0 },
    metadata:  None
    )
  4. Pass this input into play_source.

Sorry for the trouble is this is not a bug, I just can't completely follow the route the code takes to end up this way.

FelixMcFelix commented 1 year ago

To my knowledge this is intended; maybe it shouldn't panic, but equally your audio frames need to be small enough to fit in a single UDP packet without fragmenting. The current limit takes some headroom under the typical MTU since there's variation among ISPs -- it could stand to go up a bit, possibly.

My back of the napkin math suggests your frames are encoded at... 572kbps assuming 20ms each? I don't think Discord are smart enough to stop you regardless, but that does seem a tad extreme.

Pranav-Badrinathan commented 1 year ago

So in that case play_sound should be called per audio frame? In this case I was combining multiple frames and calling the function for each combined collection, hence the large size.

Pranav-Badrinathan commented 1 year ago

Closing this then, as this is not a bug. Thanks for taking the time to reply, and sorry for the inconvenience!

FelixMcFelix commented 1 year ago

Not necessarily per audio frame; if you're stuffing ~5 frames into a packet, if you know their length then you can just write out the length of each individually. I.e. (header) | len(frame1) | frame1 | len(frame2) | frame2 | len(frame3) | .... If this needs to be longer-lived, you could presumably write a custom Read implementation which blocks on receiving frames, reads into an internal buffer, and then handles this transformation for each packet.

Pranav-Badrinathan commented 1 year ago

I was doing the same without the header, and it was playing the first frame, skipping the rest. If it is required to qualify each packet as a file made up of multiple frames, I'll attempt it with the header.

If not, I can simply just play it frame by frame. Thanks!