Strange distortion when decoding

brandonsturgeon commented 2 years ago

Hi,

I'm remaking an app I wrote in Node in Golang. The app receives proprietary voice packets, plucks out the opus data, and decodes it into PCM.

The Node app works great, it produces clear audio just as expected.

My Golang app is producing audio with weird bits of distortion. Popping, vaguely-robotic tone to the voice. It's strange. I had them run in parallel to see where the disconnect was. The apps are behaving identically until they go to decode the opus data.

Here's a spectrogram comparison of the outputs (top is Go, bottom is Node): audacity_XFyQ1VhfC1

You can see those thick vertical bars in the Golang output that aren't present in the other.

I've checked, double checked, and triple checked that I'm reading the voice data correctly, not dropping precision anywhere, and that I'm setting up the decoder properly. I've also verified that both containers have the same version of libopus-dev and libopusfile-dev.

I've tried Decoder.Decode and Decoder.DecodeFloat32 - same result.

My input opus audio is 1 channel at 24 kHz (16 bit).

I'm just at a loss of how else to debug this. Does this info spark any ideas with you? Do you have any suggestions for how I could proceed?

Most of my code is decomposing the proprietary voice packets, so I'm not sure how helpful it'd be to share what I have, but I'd be happy to if you think it'd help.

Thank you!

hraban commented 2 years ago

Hi Brandon, I think you forgot to attach the spectograms.

My first step for debugging this would be to compare the raw opus data extracted by your node and go programs: if they're equal, then the problem lies with decoding the packets. If they're not, then it's in the extraction of opus data from that proprietary stream.

I'd output them using something very easy to compare, e.g. a text file with one line of hex encoded binary data per line. If there is any diff, this makes it easy to identify where the issue is, hopefully.

If you've done that and the opus data is the same, I'd check one more thing: are you actually receiving raw opus packets, or are you receiving a OGG/Opus stream? (see the README for more details on this).

If you've done all that and you're sure it's raw opus data coming in and they're exactly the same between nodejs and go, I'd have another close look at the initialisation of the decoder: are all the parameters passed exactly right? Are you extracting the resulting PCM data correctly? Are you handling the mono vs stereo case correctly? (NB that libopus returns a single channel as a single array, which might not fit your PCM device API).

For context: this wrapper is extremely thin. It doesn't do any allocation or data handling of the actual audio: it's a direct, zerocopy passthrough to libopus. If the Node.JS lib you're using does some smarter handling of the data, there's another avenue for bugs to creep in when switching to this lib.

Good luck 🙂

hraban commented 2 years ago

I'm closing this until further update indicates that it is specifically an issue with this library.

brandonsturgeon commented 2 years ago

Thanks for the very helpful reply!

After a great deal more debugging, I didn't get any further.

The data was the exact same going into the decoder in my Go project as in my Node project.

I also tried another Go Opus library but got the same result - so it's clearly not a problem specifically with this library.

Still, the mystery remains 🤦‍♂️

hraban commented 2 years ago

@brandonsturgeon what was the node library you were using?

brandonsturgeon commented 2 years ago

We're using Discord's Opus library: https://github.com/discordjs/opus

Comparing your call to libopus:

    n := int(C.opus_decode(
        dec.p,
        (*C.uchar)(&data[0]),
        C.opus_int32(len(data)),
        (*C.opus_int16)(&pcm[0]),
        C.int(cap(pcm)/dec.channels),
        0))

To discordjs':

    int decodedSamples = opus_decode(
        this->decoder,
        compressedData,
        compressedDataLength,
        &(this->outPcm[0]),
        MAX_FRAME_SIZE,
        /* decode_fec */ 0
    );

They're practically identical, so... I have no idea.

To my eyes, the only difference is the second-to-last parameter. You use the size of the input buffer. They use MAX_FRAME_SIZE, which I believe is:

#define MAX_FRAME_SIZE 6 * 960

I actually tried modifying your opus to use 6 * 960 in that parameter and found zero difference in the output. Still garbled.

brandonsturgeon commented 2 years ago

Fixed it!

I was creating a new instance of the decoder every time I needed to use it.

Creating just one of them and re-using it for subsequent packets works flawlessly 👍

hraban / opus

Strange distortion when decoding #49