jketterl / codecserver

Modular audio codec server
GNU General Public License v3.0
15 stars 5 forks source link

Document protocol / provide gRPC `service` definition #8

Closed thomastoye closed 1 year ago

thomastoye commented 2 years ago

Thanks for starting this project. I want to take it for a spin, but it's unclear how I should invoke it. I'm not clear whether the communication occurs over gRPC or if the protobufs are just used to encode data. I assume the latter based on this line in the README:

It uses a protocol based on protocol buffers (or protobuf for short) to communicate, exchanging length-delimited Any-encapsulated messages on the sockets.

Could you document the protocol? I see the following messages are supported to be received: Request, Check, ChannelData, SpeechData, Renegotiation.

For [server:tcp]; port=1073; bind=::, do I assume right that I make a TCP connection to 1073, and then the protocol should work as follows (for AMBE->raw):

  1. Optional: I start by sending a Check with codec=ambe message, as a sanity check and to make sure we can connect to the server
  2. I send a Request message, with Direction.DECODE. This will start a new session (and only one session can be in progress). I'm unsure why directions is repeated - does the order matter or does it just set up Sessions to process both ChannelData and SpeechData (when I want to do both decoding and encoding, should I always go ABABABAB or can I interleave them as I want like AAABABBABAA)?
  3. I start sending SpeechData and I receive nothing back. The data is sent to the device async.
  4. When I want to read the decoded data, what do I do? I think the data is sent directly on the socket?

I am also unclear on the data format for SpeechData and ChannelPacket (which I assume is the decoded/unencoded speech data). I guess these will be codec and device-dependent? I think FramingHint describes this, is that correct? For the ambe3k driver in particular, is the output encoded with PCM3500?

Would you consider using gRPC or adding a service definition to the .proto files? It would greatly simplify implementing client due to code generation. Or if you answer my questions here, I will gladly help to document the protocol in the README.

jketterl commented 2 years ago

As you already guessed, this project does not use gRPC at all. I'm not sure how I can make this clear in the documentation other than not mentioning it.

I came to protobuf because it seemed to be a good tool to serialize structured data, and a well supported alternative to JSON, especially once binary data gets involved. Unfortunately, protobuf does not really offer an equally well supported solution when it comes to delimiting multiple messages to be sent over a serial data line (socket).

The one solution offered repeatedly throughout the resources is to simply pack the individual messages inside an Any message, serialize that, send the length of the resulting binary data, followed by the actual data, so that's what this project does. This pattern seemed to be somewhat commonly used. Protobuf seems to bundle implementations of this pattern for some languages, but not for all of them. C++ is not among them, unfortunately, and interoperability with the other languages is yet to be tested.

Your understanding of the individual messages is largely correct. Typically, you'd start of by evaluating the server Handshake, but I guess that's at your discrection.

The Check message is meant to be used as a separate call. I'm using that in OpenWebRX to detect if an AMBE codec is available before even showing the corresponding modes. If your intention is to decode data, you can start with a Request and inspect the resulting Response for success.

The direction is repeated because you can actually encode and decode on a single channel, at least if the hardware supports it. I have not done any experiments with encoding so far, so encoding support is purely theoretical at the moment. You should not need to pay attention to the interleaving on the client side, the ambe3k module is passing the data to the chip in the way that's recommended in the data sheet.

If you sent a request with Direction.DECODE, you should be sending ChannelData messages, since they represent the encoded data. The server should then, at some point, start responding with SpeechData messages on the socket. Transmitting and receiving should be uncoupled (non-blocking IO or threaded) for timing reasons.

The actual data inside the data messages indeed depends strongly on the used codec and its configuration. The FramingHint should give you some idea since the data should be sliced according to the codec's requirements. There's no guarantee that this information will be available though since it cannot be queried on the hardware. Also, there may be a good point to provide some more meta information in the future.

For AMBE, the framing is done in 20ms intervals, so the ChannelData must contain the corresponding amount of bits / bytes for that amount of time. For example, DMR data in codec index 33 has a gross rate including FEC of 3600 bits/s, or 72 bits per frame. As such, a ChannelData message should hold 72 / 8 = 9 bytes of data. The FramingHint therefor should tell you that channelBits is 72 and channelBytes is 9. Codec index 34 (same but without FEC) has a gross data rate of 2450 bits/s, or 49 bits per frame. Since 49 doesn't evenly divide by 8, the number of bytes is rounded to 7 in this case.

Also for Ambe, the audio data is provided as 16bit raw PCM sampled at 8kHz. That breaks down to 160 samples, or 320 bytes for each individual SpeechData message.

I'll leave this open until the corresponding documentation is available. I'd prefer the wiki.

thomastoye commented 2 years ago

Hi @jketterl, posting here since I have no edit rights to the wiki. Could you proofread the below and add it to the wiki? Thanks.

--

# Protocol

## Communicating with a `codec-server`

### Message wire format

[Protobuf](https://developers.google.com/protocol-buffers/) is used for serialization. The `.proto` files can be found in [`src/lib/proto`](https://github.com/jketterl/codecserver/blob/develop/src/lib/proto).

Messages on the wire are preceded by an unsigned integer in [Protobuf Varint](https://developers.google.com/protocol-buffers/docs/encoding#varints) encoding. This indicates the length of the following message.

The message itself is packed in an [`Any`](https://developers.google.com/protocol-buffers/docs/reference/google.protobuf#google.protobuf.Any) message. This allows recovery of the type. In C++ and Java, `Coded{Input,Output}Stream`s may be used to simplify integration.

To receive a message without such support:

1. Read the `Varint` to know the expected length of the message
1. Read the message as an [`Any`](https://developers.google.com/protocol-buffers/docs/reference/google.protobuf#google.protobuf.Any)
1. Use the `type_url` to construct an object of the appropriate type

To send a message without such support:

1. Pack the message in an [`Any`](https://developers.google.com/protocol-buffers/docs/reference/google.protobuf#google.protobuf.Any) (e.g. `any = google.protobuf.any_pb2.Any(); any.Pack(my_msg)`)
1. Get the size in bytes of the message and encode it as a `Varint`
1. Send this size in bytes, then the message over the wire

### Connecting

Connect to the server, either over TCP or over UNIX domain sockets. Check the `codecserver.conf` (`[server:...]` section) of the server to see what you have enabled for your server and what the port or socket path is.

### Setup

When connecting, the server will send you a `Handshake` message. You can use the `serverVersion` to ensure you have a compatible implementation.

To check if the server has support for the codec you want to use, send a `Check` message. If the codec is supported, you will received a `Response` message with `Status: OK`. You can also immediately try to start a session with a codec and see if that is successful, `Check` is mostly meant as a stand-alone message to see what codecs are supported.

### Starting a session

To start a session, send a `Request` with the `codec` you want to use. You can select the directions you want to use (encoding, decoding, or both), some devices may not implement both (software devices in particular). The codec you want to use may require you to provide `args`, these are codec-specific. For example, the `ambe` codec can take a `rate` or a `ratep` string.

In the `Response`, the server may indicate the expected framing for the codec and argument you selected. This describes the number of bits expected for encoded/decoded samples.

### Coding audio

To encode or decode audio, send `SpeechData` or `ChannelData`, respectively. Coding is an asynchronous process and you should not await this synchronously.

To encode data, send `SpeechData` messages to the server. At some point you will receive `ChannelData` back, this is the encoded data. To decode data, send `ChannelData` messages to the server. At some point you will receive `SpeechData` back, this is the decoded data.

The format of both `SpeechData` and `ChannelData` will be specific to the codec and its arguments. 
thomastoye commented 2 years ago

Hi @jketterl. Still have not been able to figure out how to communicate with codecserver. I get unexpected packet received from stick in the output of codecserver. I created an example repo with a client I'm trying to write at https://github.com/thomastoye/codecserver-client-example/blob/master/main.py

When I run it, I see the following from codecserver:

$ docker run -p 1073:1073 -it --rm -v $PWD/codecserver:/etc/codecserver --device /dev/ttyUSB0 jketterl/codecserver:latest
Hello, I'm the codecserver.
now scanning for modules...
registering new driver: "ambe3k"
loading devices from configuration...
Product id: AMBE3000R; Version: V120.E100.XXXX.C106.G514.R009.B0010411.C0020208
detected AMBE3000, creating one channel
registering new device for codecs: ambe, 
auto-detecing devices...
scanning for "ambe3k" devices...
device scan complete.
check for codec: ambe
client requests codec ambe
starting new session on channel 0
renegotiating: direction: decode enccode; index: 33
channel 0: init response received; RateT response received; 
unexpected packet received from stick

Output from my code:

$ python main.py
number of WAV packets: 2551
line break pos 1 data length (75, 1)
[type.googleapis.com/CodecServer.proto.Handshake] {
  serverName: "codecserver"
  serverVersion: "0.2.0-dev"
}

Got a handshake from the server. Server name: codecserver, version 0.2.0-dev
send check
Received: b'0\n.type.googleapis.com/CodecServer.proto.Response'
line break pos 1 data length (48, 1)
[type.googleapis.com/CodecServer.proto.Response] {
}

Got a response from the server. Status: 0, message: , framing hint: 
Received: b'>\n.type.googleapis.com/CodecServer.proto.Response\x12\x0c\x1a\n\x08H\x10\t\x18\xa0\x01 \xc0\x02'
line break pos 1 data length (62, 1)
[type.googleapis.com/CodecServer.proto.Response] {
  framing {
    channelBits: 72
    channelBytes: 9
    audioSamples: 160
    audioBytes: 320
  }
}

Got a response from the server. Status: 0, message: , framing hint: channelBits: 72
channelBytes: 9
audioSamples: 160
audioBytes: 320

(It sends the first audio packet but then hangs as it does not receive a reply)

jketterl commented 1 year ago

I just came back to this after some time... I do have to apologize, I didn't really pay much attention to this project once it had reached a good enough state to work for what I needed it for.

I checked out your example and got the same results. Looking at the code, I found that channel packets (that's packets containing encoded speech data from the stick) were not handled at all in one particular spot in the code.

I have now added the missing implementation and I believe your example does now work better, at least it looks like it is actually receiving data, and it is also completing the example file all the way to the end.

Also, on the documentation: Thank you for writing up the protocol documentation. I have added it as a new page on the wiki, I'm going to do the proofreading there in a minute. The wiki is also open to edit, so feel free to add things there if necessary.