Closed WildZed closed 4 years ago
I saw the suggestion of introducing audio compression to Jamulus
Jamulus uses the low latency OPUS audio codec. So it uses audio compression.
I wonder if it would be possible to use graphics cards for fast audio compression.
The OPUS encoder is very efficient. I can run Jamulus on a Raspberry Pi Zero in real-time. So any modern PC should not have any problems encoding the OPUS packets. Therefore it makes no sense to use the graphics card.
Ok, sounds good, although it was gating that was the main focus of this suggestion. Similar to gates used on audio desks. Drop audio packets if the audio is below a dB threshold, with attack, hold and decay delay values. It should be possible to set reasonable default values based on the characteristics of the client's audio feed.
If someone is not playing, listening or they've muted themself, you save on internet traffic and may improve latency as a result. Sometimes it's the simplest algorithm tweaks that can give the biggest gains.
I think that if the client drops audio packets it would break the recorder feature, as I think it relies on a constant stream of data. Am I right @pljones?
You'd need to send a packet saying "there was silence" -- it could be very short, compared with an OPUS packet, but it would break every client and server currently in use, so it would need introducing in a way that was backwards compatible (i.e. negotiate between client and server that there was support both ends). The server would then "magically create" a frame of audio silence, the recorder would record it and the mixer would mix it (or that could get skipped).
The client would have to remain in sync with it's own samples of course - it would have to read a frame and either send the frame or the "no audio" frame each time. There's no option of sending nothing or you're "dropping frames" essentially, which is bad.
Another issue here is how you handle transitions from, say -45dB to -55dB if -48dB is your gate threshold. Do you suddenly cut and cause the waveform to break or do you gradually transition? You really want to avoid those sudden drops as they make more noise you don't want. And the fade out means adding new audio processing to the client. You'd also need to fade back in when you transition to sending real frames.
Remember, things like video conferencing are not real time -- there a large buffer. I've experienced up to three or four seconds (sitting next to someone at work back when we were in the office...) and that's plenty of time to process the video and audio signals. Jamulus does not have the luxury of working like this - it's processing one frame (64 or 128 samples) at a time, then sending it, forgetting it and moving on.
There's also another drawback on this, which is people with non-optimal setups. I know a drummer that communicates with us through the overhead microphone, so we get his voice really low. The gate could possibly cut his voice. If this would be done, it would be nice to enable people to disable it, to avoid problems with such cases.
Interesting. Thanks for the detail. Not knowing the protocol, I assumed that since it was UDP, it would need to cope with dropped packets and have some sort of sync/timing mechanism, so that a dropped packet is equivalent to silence anyway. I understand that it is rare for UDP packets to be dropped in practise, so it makes sense. I'm not familiar with OPUS and not an audio engineer either, but have some amateur understanding.
With audio gating you have a 3 phase transition, the "attack" phase is usually very short and ramps up the audio from "silence" over that time period. You then "hold" the gate if it drops below the threshold for a longer period, maybe a few 100s of milliseconds. If the level hasn't increased in that period you transition to decay, where the audio decays over another shorter time period. These three phases are configurable.
The attack can prevent sudden noises or crashes, but if set too long can lose the attack you desire in your instrument. Drums need a very short attack period. You'd have it longer for voice. Hold is probably about avoiding dropping the gate too often, making it stuttery. Decay, I'm not sure, but probably doesn't matter too much unless it is too short.
You would probably set very conservative gates by default and allow modification for advanced users.
One frame for Jamulus is 64 samples at 48,000 samples per second, which is 1.333ms. Users expect round trip latency of around 40ms maximum - adding any local buffering wouldn't be acceptable.
So you're going to have to "remember" that 1.333ms of audio in some new gating engine running along side the existing client thread. When that new engine thinks "Ah, cutting transmission would be good now", it would need to signal the existing audio engine that the next frame is the last "normal" one and the ones after it will be sent by the gating engine (either during "attack" for the ramps or "hold" for the shortened "no audio"). Once it decides, "Oh, we have signal again", it would ramp again and then signal the existing engine it was handing back control.
It might not be necessary (or desirable) to have the two "engines" as actual threads, I'm talking "schematically", really.
A suitable audio input gate can be virtually unnoticeable but cut out a considerable amount of data.
What is the actual usecase of this?
A suitable audio input gate can be virtually unnoticeable but cut out a considerable amount of data.
What is the actual usecase of this?
The reason is in the original suggestion. It is just a suggestion to reduce unnecessary data sending. I observed lots of audio packets sent when nothing is happening, the obvious thing would be to prune the useless data.
Ok, but why should I reduce data traffic? My internet connection must support the maximum data rate anyway so when I play I need that bandwidth. But when I am in a session, I usually play my instrument. Or are you referring to the case that you have a "listen only" client? For that, we have a separate issue: https://github.com/corrados/jamulus/issues/160
Ok, but why should I reduce data traffic?
As a developer I have seen many instances of code using unnecessary cpu or resource, because of inappropriate algorithms, making applications many times slower than they could be, so I have some interest in this.
For internet, the answer is not always to increase bandwidth. This is not always available. My current ISP is bad for this and I'm wanting to change.
My internet connection must support the maximum data rate anyway so when I play I need that bandwidth. But when I am in a session, I usually play my instrument.
Instruments do not always play 100% of the time. Trumpet parts in a band are often sparser than say a bass part.
Or are you referring to the case that you have a "listen only" client? For that, we have a separate issue: #160
No, but they could be solved by the same feature.
I have no knowledge of the workings of this software, so if cost vs benefit is not worth it, then thanks for giving the idea some consideration.
if cost vs benefit is not worth it
Well, that is why I was asking for a specific usecase.
I can see it adding complexity with little benefit (if any).
I can see it adding complexity with little benefit (if any).
Destroyed.
If you feed digital silence into the Opus encoder the bandwith of the audio data drops to almost nothing (as long as you don't use hard-cbr mode).
$ ./silentwav.py | opusenc --bitrate 256 --framesize 60 --cvbr - /dev/null
Encoding using libopus 1.3 (audio)
-----------------------------------------------------
Input: 48kHz 1 channel
Output: 1 channel (1 uncoupled)
60ms packets, 256kbit/sec CVBR
Preskip: 312
Encoding complete
-----------------------------------------------------
Encoded: 1 minute and 0.02 seconds
Runtime: 1 seconds
(60.02x realtime)
Wrote: 11546 bytes, 1001 packets, 65 pages
Bitrate: 1.06671kbit/s (without overhead)
Instant rates: 0.4kbit/s to 1.06667kbit/s
(3 to 8 bytes per packet)
Overhead: 30.7% (container+metadata)
you still have the overhead the Opus packets, but even that shouldn't be a problem (now with 2.5 ms frame size):
$ ./silentwav.py | opusenc --bitrate 256 --framesize 2.5 --cvbr - /dev/null
Encoding using libopus 1.3 (low-delay)
-----------------------------------------------------
Input: 48kHz 1 channel
Output: 1 channel (1 uncoupled)
2.5ms packets, 256kbit/sec CVBR
Preskip: 120
Encoding complete
-----------------------------------------------------
Encoded: 1 minute and 0.0025 seconds
Runtime: 3 seconds
(20x realtime)
Wrote: 99410 bytes, 24001 packets, 97 pages
Bitrate: 9.6kbit/s (without overhead)
Instant rates: 9.6kbit/s to 9.6kbit/s
(3 to 3 bytes per packet)
Overhead: 27.6% (container+metadata)
I'm not sure if and how much additional overhead is added by Jamulus though.
I assume that Jamulus would pass digital silence on it's input channels unchanged to the Opus encoder. So it should be possible to use a gate between the instrument and Jamulus and reduce the bandwidth and CPU usage. It shouldn't be to hard to with jackd, no idea how that would work on Windows and macOS.
Do I miss something?
(Encoding was done on a Raspberry Pi 2)
https://github.com/corrados/jamulus/blob/master/src/client.cpp#L105
Comment says what but not why.
When receiving audio packets, the size of the packet is checked. So the size must be fixed.
And why is the packet size checked?
Because the jitter buffer works on coded data and the size of the blocks in the jitter buffer must not change.
I was looking at the OPUS documentation and the parameters related to noise optimization seems to be related to DTX mode -seems to be addressed in jamulus' code already-, which is targeted to VOIP transmissions and the insert of comfort noise (CNG) in case of silence. Couldn't find any other parameters that may be useful for the codec to handle a reduce of bandwidth usage on silence periods.
Other way to reduce data exchanged may be to kill the down mix on secondary client instances as discussed in https://github.com/corrados/jamulus/issues/96#issuecomment-616137844 No idea of the impact on the clients and server in terms of performance, but for sure those additional mixed stream are not needed when running multiple clients in the same PC/Mac.
This Issue is very old. Shouldn't we close it?
No comment yet -> closed.
When I was observing Wireshark Jamulus protocol I noticed that audio packets are sent even when no one is connected sometimes or when there is very little sound input. This made me think of gating. A suitable audio input gate can be virtually unnoticeable but cut out a considerable amount of data. I think VoIP methods probably do this, or maybe it is part of the compression scheme.
Gating might be easier/faster than using compression.
I saw the suggestion of introducing audio compression to Jamulus, I thought it might not use compression due to the extra delays it might introduce.
It may not be suitable for Jamulus but I thought I should raise it, in case it would help.
Another wild thought: I wonder if it would be possible to use graphics cards for fast audio compression.