jamulussoftware / jamulus

Jamulus enables musicians to perform real-time jam sessions over the internet.
https://jamulus.io
Other
1.01k stars 223 forks source link

Use a gate mechanism to reduce audio data #270

Closed WildZed closed 4 years ago

WildZed commented 4 years ago

When I was observing Wireshark Jamulus protocol I noticed that audio packets are sent even when no one is connected sometimes or when there is very little sound input. This made me think of gating. A suitable audio input gate can be virtually unnoticeable but cut out a considerable amount of data. I think VoIP methods probably do this, or maybe it is part of the compression scheme.

Gating might be easier/faster than using compression.

I saw the suggestion of introducing audio compression to Jamulus, I thought it might not use compression due to the extra delays it might introduce.

It may not be suitable for Jamulus but I thought I should raise it, in case it would help.

Another wild thought: I wonder if it would be possible to use graphics cards for fast audio compression.

corrados commented 4 years ago

I saw the suggestion of introducing audio compression to Jamulus

Jamulus uses the low latency OPUS audio codec. So it uses audio compression.

I wonder if it would be possible to use graphics cards for fast audio compression.

The OPUS encoder is very efficient. I can run Jamulus on a Raspberry Pi Zero in real-time. So any modern PC should not have any problems encoding the OPUS packets. Therefore it makes no sense to use the graphics card.

WildZed commented 4 years ago

Ok, sounds good, although it was gating that was the main focus of this suggestion. Similar to gates used on audio desks. Drop audio packets if the audio is below a dB threshold, with attack, hold and decay delay values. It should be possible to set reasonable default values based on the characteristics of the client's audio feed.

If someone is not playing, listening or they've muted themself, you save on internet traffic and may improve latency as a result. Sometimes it's the simplest algorithm tweaks that can give the biggest gains.

Snayler commented 4 years ago

I think that if the client drops audio packets it would break the recorder feature, as I think it relies on a constant stream of data. Am I right @pljones?

pljones commented 4 years ago

You'd need to send a packet saying "there was silence" -- it could be very short, compared with an OPUS packet, but it would break every client and server currently in use, so it would need introducing in a way that was backwards compatible (i.e. negotiate between client and server that there was support both ends). The server would then "magically create" a frame of audio silence, the recorder would record it and the mixer would mix it (or that could get skipped).

The client would have to remain in sync with it's own samples of course - it would have to read a frame and either send the frame or the "no audio" frame each time. There's no option of sending nothing or you're "dropping frames" essentially, which is bad.

Another issue here is how you handle transitions from, say -45dB to -55dB if -48dB is your gate threshold. Do you suddenly cut and cause the waveform to break or do you gradually transition? You really want to avoid those sudden drops as they make more noise you don't want. And the fade out means adding new audio processing to the client. You'd also need to fade back in when you transition to sending real frames.

Remember, things like video conferencing are not real time -- there a large buffer. I've experienced up to three or four seconds (sitting next to someone at work back when we were in the office...) and that's plenty of time to process the video and audio signals. Jamulus does not have the luxury of working like this - it's processing one frame (64 or 128 samples) at a time, then sending it, forgetting it and moving on.

Snayler commented 4 years ago

There's also another drawback on this, which is people with non-optimal setups. I know a drummer that communicates with us through the overhead microphone, so we get his voice really low. The gate could possibly cut his voice. If this would be done, it would be nice to enable people to disable it, to avoid problems with such cases.

WildZed commented 4 years ago

Interesting. Thanks for the detail. Not knowing the protocol, I assumed that since it was UDP, it would need to cope with dropped packets and have some sort of sync/timing mechanism, so that a dropped packet is equivalent to silence anyway. I understand that it is rare for UDP packets to be dropped in practise, so it makes sense. I'm not familiar with OPUS and not an audio engineer either, but have some amateur understanding.

With audio gating you have a 3 phase transition, the "attack" phase is usually very short and ramps up the audio from "silence" over that time period. You then "hold" the gate if it drops below the threshold for a longer period, maybe a few 100s of milliseconds. If the level hasn't increased in that period you transition to decay, where the audio decays over another shorter time period. These three phases are configurable.

The attack can prevent sudden noises or crashes, but if set too long can lose the attack you desire in your instrument. Drums need a very short attack period. You'd have it longer for voice. Hold is probably about avoiding dropping the gate too often, making it stuttery. Decay, I'm not sure, but probably doesn't matter too much unless it is too short.

WildZed commented 4 years ago

You would probably set very conservative gates by default and allow modification for advanced users.

pljones commented 4 years ago

One frame for Jamulus is 64 samples at 48,000 samples per second, which is 1.333ms. Users expect round trip latency of around 40ms maximum - adding any local buffering wouldn't be acceptable.

So you're going to have to "remember" that 1.333ms of audio in some new gating engine running along side the existing client thread. When that new engine thinks "Ah, cutting transmission would be good now", it would need to signal the existing audio engine that the next frame is the last "normal" one and the ones after it will be sent by the gating engine (either during "attack" for the ramps or "hold" for the shortened "no audio"). Once it decides, "Oh, we have signal again", it would ramp again and then signal the existing engine it was handing back control.

It might not be necessary (or desirable) to have the two "engines" as actual threads, I'm talking "schematically", really.

corrados commented 4 years ago

A suitable audio input gate can be virtually unnoticeable but cut out a considerable amount of data.

What is the actual usecase of this?

WildZed commented 4 years ago

A suitable audio input gate can be virtually unnoticeable but cut out a considerable amount of data.

What is the actual usecase of this?

The reason is in the original suggestion. It is just a suggestion to reduce unnecessary data sending. I observed lots of audio packets sent when nothing is happening, the obvious thing would be to prune the useless data.

corrados commented 4 years ago

Ok, but why should I reduce data traffic? My internet connection must support the maximum data rate anyway so when I play I need that bandwidth. But when I am in a session, I usually play my instrument. Or are you referring to the case that you have a "listen only" client? For that, we have a separate issue: https://github.com/corrados/jamulus/issues/160

WildZed commented 4 years ago

Ok, but why should I reduce data traffic?

As a developer I have seen many instances of code using unnecessary cpu or resource, because of inappropriate algorithms, making applications many times slower than they could be, so I have some interest in this.

For internet, the answer is not always to increase bandwidth. This is not always available. My current ISP is bad for this and I'm wanting to change.

My internet connection must support the maximum data rate anyway so when I play I need that bandwidth. But when I am in a session, I usually play my instrument.

Instruments do not always play 100% of the time. Trumpet parts in a band are often sparser than say a bass part.

Or are you referring to the case that you have a "listen only" client? For that, we have a separate issue: #160

No, but they could be solved by the same feature.

I have no knowledge of the workings of this software, so if cost vs benefit is not worth it, then thanks for giving the idea some consideration.

corrados commented 4 years ago

if cost vs benefit is not worth it

Well, that is why I was asking for a specific usecase.

pljones commented 4 years ago

I can see it adding complexity with little benefit (if any).

WildZed commented 4 years ago

I can see it adding complexity with little benefit (if any).

Destroyed.

streaps commented 4 years ago

If you feed digital silence into the Opus encoder the bandwith of the audio data drops to almost nothing (as long as you don't use hard-cbr mode).

$ ./silentwav.py | opusenc --bitrate 256 --framesize 60 --cvbr - /dev/null
Encoding using libopus 1.3 (audio)
-----------------------------------------------------
   Input: 48kHz 1 channel
  Output: 1 channel (1 uncoupled)
          60ms packets, 256kbit/sec CVBR
 Preskip: 312

Encoding complete                                
-----------------------------------------------------
       Encoded: 1 minute and 0.02 seconds
       Runtime: 1 seconds
                (60.02x realtime)
         Wrote: 11546 bytes, 1001 packets, 65 pages
       Bitrate: 1.06671kbit/s (without overhead)
 Instant rates: 0.4kbit/s to 1.06667kbit/s
                (3 to 8 bytes per packet)
      Overhead: 30.7% (container+metadata)

you still have the overhead the Opus packets, but even that shouldn't be a problem (now with 2.5 ms frame size):

$ ./silentwav.py | opusenc --bitrate 256 --framesize 2.5 --cvbr - /dev/null
Encoding using libopus 1.3 (low-delay)
-----------------------------------------------------
   Input: 48kHz 1 channel
  Output: 1 channel (1 uncoupled)
          2.5ms packets, 256kbit/sec CVBR
 Preskip: 120

Encoding complete                                  
-----------------------------------------------------
       Encoded: 1 minute and 0.0025 seconds
       Runtime: 3 seconds
                (20x realtime)
         Wrote: 99410 bytes, 24001 packets, 97 pages
       Bitrate: 9.6kbit/s (without overhead)
 Instant rates: 9.6kbit/s to 9.6kbit/s
                (3 to 3 bytes per packet)
      Overhead: 27.6% (container+metadata)

I'm not sure if and how much additional overhead is added by Jamulus though.

I assume that Jamulus would pass digital silence on it's input channels unchanged to the Opus encoder. So it should be possible to use a gate between the instrument and Jamulus and reduce the bandwidth and CPU usage. It shouldn't be to hard to with jackd, no idea how that would work on Windows and macOS.

Do I miss something?

(Encoding was done on a Raspberry Pi 2)

corrados commented 4 years ago

https://github.com/corrados/jamulus/blob/master/src/client.cpp#L105

WildZed commented 4 years ago

https://github.com/corrados/jamulus/blob/master/src/client.cpp#L105

Comment says what but not why.

corrados commented 4 years ago

When receiving audio packets, the size of the packet is checked. So the size must be fixed.

streaps commented 4 years ago

And why is the packet size checked?

corrados commented 4 years ago

Because the jitter buffer works on coded data and the size of the blocks in the jitter buffer must not change.

WolfganP commented 4 years ago

I was looking at the OPUS documentation and the parameters related to noise optimization seems to be related to DTX mode -seems to be addressed in jamulus' code already-, which is targeted to VOIP transmissions and the insert of comfort noise (CNG) in case of silence. Couldn't find any other parameters that may be useful for the codec to handle a reduce of bandwidth usage on silence periods.

Other way to reduce data exchanged may be to kill the down mix on secondary client instances as discussed in https://github.com/corrados/jamulus/issues/96#issuecomment-616137844 No idea of the impact on the clients and server in terms of performance, but for sure those additional mixed stream are not needed when running multiple clients in the same PC/Mac.

corrados commented 4 years ago

This Issue is very old. Shouldn't we close it?

corrados commented 4 years ago

No comment yet -> closed.