Need help with decoding the incoming audio

aygupt1822 commented 2 years ago

I am trying to make a Zello application Python client using API documentation from the Zello Github repo.

I am having problems with the decoding the raw data.

The documentation of the Zello library sucks as it clearly doesn't explain anything related to the decoding part. I also looked at some other similar issues raised by other users but the answers were not satisfactory at all.

I sincerely need help with this.

AndyW999 commented 1 year ago

Do you mean the JSON or the Opus codec?

Both are well documented and used everywhere - try Google?

tlstpierre commented 8 months ago

I was able to get this to work in Go without any issues. This is binary data, so you may have to do a bit of bit-banging to make it work:

Look at the first byte of the binary message. If it == 0x01, then this is an audio stream.
Take bytes 1-4 and decode them to a Uint32 type with BigEndian encoding. Not sure how to do this with Python, but presumably you can take a sub-set of the byte array and there is a function that will take the four bytes and give you a Uint32 back. This is your stream ID (save this to a variable somewhere).
Take bytes 5-8 and decode these the same way - this is your packet ID. This number will increment for every packet.
The rest of the binary data from bytes 9 to the end is your Opus encoded audio. Send that to your Opus decoder to get the audio frame.

Here's a few other tips that might help:

In order to initialize your Opus decoder, you need to get the format information from the stream start message. In my application, I use this to create a data structure to receive the audio, open up the sound card stream, etc.
You will need to create some sort of jitter buffer structure to hold the incoming audio frames before you play them out, to ensure that the output doesn't get starved if there is a delay in a packet arriving. It isn't a big concern with TCP based audio, but you may want to look at the packet ID numbers to make sure they increment without a gap, and go in order.
Here's an example of the signal flow: incoming packet > decode header (stream ID and packet ID) > Opus decoder > slice (array?) of Int16 audio samples > buffer > output.

The documentation is quite good, once you understand the binary encoding part of it. Do you have any more specific detail about where you are stuck?

zelloptt / zello-channel-api

Need help with decoding the incoming audio #191