wb2osz / direwolf

Dire Wolf is a software "soundcard" AX.25 packet modem/TNC and APRS encoder/decoder. It can be used stand-alone to observe APRS traffic, as a tracker, digipeater, APRStt gateway, or Internet Gateway (IGate). For more information, look at the bottom 1/4 of this page and in https://github.com/wb2osz/direwolf/blob/dev/doc/README.md
GNU General Public License v2.0
1.51k stars 299 forks source link

100% cpu usage with I2C audio hats #491

Open craigerl opened 8 months ago

craigerl commented 8 months ago

I'm observing direwolf with 100% cpu usage when using I2C audio boards, specifically

observed on Pi 3A+, Pi Zero2W, Raspberry Pi OS bullseye not observed on the original Pi Zero. not observed with USB audio dongle arecord and mpg123 do not generate significant cpu usage with the above i2c boards observed on direwolf 1.6 and 1.7-dev-A

-craig KM6LYW

dranch commented 8 months ago

Are you using 32bit or 64bit Raspberry Pi OS? Can you please post your Direwolf configuration? Can you also please try the newest version of Direwolf from the Dev branch as Dev-A is rather old now

craigerl commented 8 months ago

Hey David,

32bit Bullseye Not using any configuration file Observed with dev branch direwolf as of this posting

thanks, -craig

craigerl commented 8 months ago

Using "-n 2" on the direwolf command line with no other options and no config file fixes the problem. cpu usage is normal (17%)

related unresolved thread, which also uses an i2c sound device ,

https://github.com/wb2osz/direwolf/issues/131

dranch commented 8 months ago

Generally speaking, Direwolf really wants a configuration file to read so I recommend you enable that. Without specifying the "-n 2" item, can you copy/paste in what you're seeing when you start Direwolf from the command line and we can see what it's assuming?

craigerl commented 8 months ago

replicated on Buster, direwolf 1.6 using "-n 2" or "ACHANNELS 2" is a fix in this case as well

craigerl commented 8 months ago
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND    
  580 pi        20   0  139440  10744   7812 S 100.0   2.2   3:38.49 direwolf 

MYCALL KM6LYW-2 IGSERVER noam.aprs2.net IGLOGIN KM6LYW-2 000000 PBEACON sendto=IG compress=1 delay=00:10 every=30:00 symbol="igate" overlay=R lat=38.9919N long=120.9956W comment=" DigiPi http://craiger.org/digipi/" DCD GPIO 16 DWAIT 0 TXDELAY 30 TXTAIL 10 AGWPORT 8000 KISSPORT 8001

export ALSA_CARD=0 direwolf -d t -p -q d -t 0 -c /run/direwolf.tnc.conf | tee /home/pi/direwolf.log &

Dire Wolf version 1.6 Includes optional support for: gpsd hamlib cm108-ptt

Reading config file /run/direwolf.tnc.conf Audio device for both receive and transmit: default (channel 0) Channel 0: 1200 baud, AFSK 1200 & 2200 Hz, E+, 44100 sample rate / 3. KM6LYW_Fix: waiting 500ms before intializing gpio because of udev lag. KM6LYW_Fix: waiting 500ms before intializing gpio because of udev lag. Ready to accept AGW client application 0 on port 8000 ... Virtual KISS TNC is available on /dev/pts/1 Created symlink /tmp/kisstnc -> /dev/pts/1 Ready to accept KISS TCP client application 0 on port 8001 ... Can't get address for IGate server noam.aprs2.net, Temporary failure in name resolution Can't get address for IGate server noam.aprs2.net, Temporary failure in name resolution [ig] KM6LYW-2>APDW16:!R:l%i/uLi& ! DigiPi http://craiger.org/digipi/ Can't get address for IGate server noam.aprs2.net, Temporary failure in name resolution

Now connected to IGate server noam.aprs2.net (44.25.16.4) Check server status here http://44.25.16.4:14501

[ig] # aprsc 2.1.14-g5e22b37 [ig] # logresp KM6LYW-2 verified, server T2USANW

craigerl commented 8 months ago

Using "-r 22000" also fixes the 100% cpu thread.

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND    
 1962 pi        20   0  132268  11372   7900 S  13.6   2.3   0:33.53 direwolf 

Channel 0: 1200 baud, AFSK 1200 & 2200 Hz, E+, 22000 sample rate.

edit: 44000 works (14% cpu) 44200 works (14% cpu) 56000 works (18% cpu) 192000 works (60%cpu) 44100 fails (100%cpu) 48000 fails (100% cpu) 48001 works (16% cpu)

anything but 44100 and 48000 seems to keep direwolf from consuming an entire cpu core no correlation between frequency divider ( / 3 )

dranch commented 8 months ago

Interesting.. btw, the Fe-Pi now the wb7fhc Nexus DR-X sound hat offers more sampling rates than most sound devices. You can get a full list of it's sampling rates by running:

sudo lsusb -vv | grep -e Audio -e tSamFreq | grep -v -e Descriptor -e bInterfaceClass

Anyway.. it's strange that the common sampling rates create high load for you on this truly stereo input / output sound device. We will have to wait to hear back from WB2OSZ on this one.

craigerl commented 8 months ago

Thanks David, this isn't a USB device (I2C device), and yah, it has something to do with the supported hardware rates for sure.

32K seems to be a hardware rate as well 32000 (41% cpu) 32001 (23% cpu)

Running stack trace on the audio thread is even more revealing,

During normal cpu usage (44001 for example) we see device polling poll([{fd=4, events=POLLIN|POLLERR|POLLNVAL}], 1, -1) = 1 ([{fd=4, revents=POLLIN}]) <0.007488>

During high cpu usage (44100, 48000, 32000) we don't see the poll call at all, and we see a nanosleep instead [pid 18648] 08:38:11 <... clock_nanosleep_time64 resumed>0x72dffd48) = 0 <1.000189> [pid 18651] 08:38:11 <... ioctl resumed>, 0x1168080) = 0 <0.000179> [pid 18648] 08:38:11 clock_nanosleep_time64(CLOCK_REALTIME, 0, {tv_sec=1, tv_nsec=0}, <unfinished ...>

dranch commented 8 months ago

One thing I noticed here is that the direwolf startup output is using the "default" audio device which is a PulseAudio device which I don't recommend. PulseAudio can do all kinds of unwanted audio processing things to the audio as well as doing silent resampling for non-native sound device sampling rates which will increase CPU load. In your direwolf.conf file, instead try specifying a native ALSA device name such as "ADEVICE plughw:1,0". Btw, also in your startup, I see you're using the shell environment variable "export ALSA_CARD=0" but I don't believe Direwolf uses that at all. If you're doing other ALSA tricks using DMIX, etc. please disable all of this and retest to see if that might be impacting Direwolf. Ultimately, Direwolf is a ALSA-native application and while it works, it's not a PulseAudio or PIpewire native application.

craigerl commented 8 months ago

Understood.

No Pulse Audio -- ever.

libasound2 honors "ALSA_CARD", so direwolf does by extension.

I get the same result when specifying an audio device in direwolf.conf (ADEVICE plughw:0,0)

no dmix afaik

craigerl commented 8 months ago

Increasing frames per period in audio.c resolves the issue.

        fpp = buf_size_in_bytes / (pa->adev[a].num_channels * pa->adev[a].bits_per_sample / 8);
//craiger - this normally comes out to 512
        fpp = 1024;

Unintended consequences, besides latency, unknown.

./src/direwolf -t 0 -r 44100 Dire Wolf version 1.7 (Oct 18 2023) BETA TEST 7 Includes optional support for: gpsd hamlib cm108-ptt

Reading config file direwolf.conf Audio device for both receive and transmit: default (channel 0) audio_open: calcbufsize (rate=44100, chans=1, bits=16) calc size=882, round up to 1024 suggest period size of 1024 frames audio buffer size = 2 (bytes per frame) x 1024 (frames per period) = 2048 audio_open: calcbufsize (rate=44100, chans=1, bits=16) calc size=882, round up to 1024 suggest period size of 1024 frames audio buffer size = 2 (bytes per frame) x 1024 (frames per period) = 2048 Channel 0: 1200 baud, AFSK 1200 & 2200 Hz, A+, 44100 sample rate / 3. Note: PTT not configured for channel 0. (Ignore this if using VOX.) Ready to accept AGW client application 0 on port 8000 ... Ready to accept KISS TCP client application 0 on port 8001 ...

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND    
31592 pi        20   0  105260  10716   7824 S  13.2   2.2   0:02.78 direwolf  
wb2osz commented 8 months ago

I don't have one of these devices so I can't do any experiments. All I know is what is found here. https://github.com/NexusDR-X/nexus-audio/blob/main/README.md

Your test results are very perplexing.

44000 works (14% cpu) 44200 works (14% cpu) 56000 works (18% cpu) 192000 works (60%cpu) 44100 fails (100%cpu) 48000 fails (100% cpu) 48001 works (16% cpu)

direwolf treats the sound device very simplistically. I picked the default of 44100 because that is the CD rate and seems to be supported by all hardware. The USB audio devices also do 48000, which I believe is the DVD rate.

I don't know the internals of the audio device driver, but one might reasonably expect that it would instruct the hardware to operate at the requested sample rate if available. That would be most efficient with no rate conversion necessary. If the desired rate is not available in hardware, it would pick some higher rate, possibly an integer multiple or at least a rational number, if possible, to make rate conversion easier/more efficient.

That's not what we see at all in the test results. It could be something weird going on in the device driver.

I added a hack, oops, I mean heuristic, for ARM processors on the assumption they would be much much slower than x86 machines. Downsampling/decimation by a factor of 3 is applied if the sample rate is much higher than needed for good AFSK decoding. Nothing fancy; it just averages 'n' incoming samples together. An old single core RPi model 1 is more than adequate with this little trick and the user not having to do anything special. You will see this change to -D2 or disapear as the incoming rate decreases.

Trying to use ALSA to map two independent virtual mono audio channels adds another layer of complexity and strange possible cases. What happens if the two derived virtual channels want different rates. Would they fight over the hardware rates or it just use 96000 and burn up CPU resources downsampling it.

The buffer size is a tradeoff. small buffer = rapid transfers =lower latency = more overhead. and the opposite. If the receive latency is too long, you will transmit on top of other people because recently started transmissions take too long to reach the decoder.

The specific direwolf version shouldn't matter. I don't think the audio input processing has been touched for a long time.

You mentioned bullseye. Have you tried bookworm. direwolf compiles OK (newer compile usually means new warnings) on the 64 bit version on an RPi 3 and passes the self tests. That's as far as I got. Maybe it has a better I2C audio driver.

Steve Magnuson, AG7GN should be brought into this conversation.

73, john WB2OSZ

craigerl commented 8 months ago

Thanks for the info.

If we're dividing sample rate by 3 on ARM, is there a downside to simply specifying a rate of "14700"? That yields 10% cpu on a Pi3 and would not introduce latency.

New observation, 90 percent of CPU time is spent in a system call (not user time) when using 44100/48000.

I'm still curious about nanosleep vs polling in the audio thread. Nanosleep is only seen when using 44100/4800 rates. We see poll() otherwise. It's like nanosleep doesn't sleep.

craigerl commented 8 months ago

Bookworm observations. 32bit/lite, Direwolf 1.6, Linux 6.1.0

CPU usage is much better than bullseye, but significantly higher when using 44100 or 48000

44101 12% (10.0user, 2.3sys) 44100 20% ( 6.3user 12sys) 48000 20% (7.4user 12.8sys) 48001 16% (18user 0.3sys)

time direwolf -r44100 real 0m21.496s user 0m1.348s sys 0m2.736s

time direwolf -r 44101 real 0m21.922s user 0m2.324s sys 0m0.438s

Like with bullseye we still see nanosleeps when using 44100/48000, and we see polling with any other rate strace -Ttf direwolf -r 44100 [pid 1147] 17:41:30 clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=7, tv_nsec=0}, <unfinished ...>

strace -Ttf direwolf -r 44101 poll([{fd=6, events=POLLIN|POLLERR|POLLNVAL}], 1, -1) = 1 ([{fd=6, revents=POLLIN}])

Bullseye invokes clock_nanosleep_time64() Bookworm invokes clock_nanosleep()

craigerl commented 8 months ago

dbgsym version of alsa-libs compiled, this is where the direwolf process is spending time, still need to see which driver the "sys" time is being spent...

22.82% direwolf direwolf [.] demod_afsk_process_sample 20.77% direwolf libasound.so.2.0.0 [.] snd_pcm_route_convert1_many 7.64% direwolf libarmmem-v7l.so [.] memmove 6.29% direwolf direwolf [.] multi_modem_process_sample 4.53% direwolf direwolf [.] nudge_pll 3.62% direwolf direwolf [.] demod_process_sample 2.62% direwolf direwolf [.] audio_get

craigerl commented 8 months ago

time being spent in snd_pcm driver, function snd_pcm_stream_unlock_irq()

44101

  18.49%  direwolf                  [.] demod_afsk_process_sample
  12.44%  libasound.so.2.0.0        [.] snd_pcm_route_convert1_many
  11.49%  [snd_pcm]                 [k] snd_pcm_stream_unlock_irq
   6.07%  libarmmem-v7l.so          [.] memmove
   5.22%  [kernel]                  [k] _raw_spin_unlock_irqrestore
   5.00%  direwolf                  [.] multi_modem_process_sample

44100

  78.11%  [snd_pcm]                 [k] snd_pcm_stream_unlock_irq
   2.83%  libasound.so.2.0.0        [.] snd_pcm_route_convert1_many
   2.66%  direwolf                  [.] demod_afsk_process_sample
   0.92%  libarmmem-v7l.so          [.] memmove
   0.82%  [snd_pcm]                 [k] snd_pcm_ioctl
   0.77%  [kernel]                  [k] _raw_spin_unlock_irqrestor
craigerl commented 8 months ago

patch for audio.c, testing okay at this point. Bumping up the frames per period seems alleviate the snd_pcm driver problem with a minimal increase in latency. In fact, the buffer/period is similar to what it comes up with for a USB/ic705 device. CPU usage goes from 100% to 16% on Raspberry Pi Zero 2W with this patch.

pi@digipi4:~/git/direwolf/clean/direwolf/build $ ./src/direwolf -t 0 -r 48000 Dire Wolf version 1.7 (Oct 25 2023) BETA TEST 7 Includes optional support for: gpsd hamlib cm108-ptt dns-sd

Reading config file direwolf.conf Audio device for both receive and transmit: default (channel 0) audio_open: calcbufsize (rate=48000, chans=1, bits=16) calc size=1152, round up to 2048 suggest period size of 1024 frames audio buffer size = 2 (bytes per frame) x 1024 (frames per period) = 2048 audio_open: calcbufsize (rate=48000, chans=1, bits=16) calc size=1152, round up to 2048 suggest period size of 1024 frames audio buffer size = 2 (bytes per frame) x 1024 (frames per period) = 2048 Channel 0: 1200 baud, AFSK 1200 & 2200 Hz, A+, 48000 sample rate / 3. Note: PTT not configured for channel 0. (Ignore this if using VOX.) Ready to accept AGW client application 0 on port 8000 ... Ready to accept KISS TCP client application 0 on port 8001 ... DNS-SD: Avahi: Announcing KISS TCP on port 8001 as 'Dire Wolf on digipi4' DNS-SD: Avahi: Service 'Dire Wolf on digipi4' successfully registered.

pi@digipi4:~/git/direwolf/direwolf/build $ sudo cat /proc/asound/card0/pcm0p/sub0/hw_params access: MMAP_INTERLEAVED format: S16_LE subformat: STD channels: 2 rate: 48000 (48000/1) period_size: 1024 buffer_size: 131072

audio.c

//craiger nudge up buffer size until frames per period hits 1024 for i2s cards
//#define ONE_BUF_TIME 10
#define ONE_BUF_TIME 12
#if __arm__
        /* Ugly hack for RPi. */
        /* Reducing buffer size is fine for input but not so good for output. */

//craiger don't inflate output buffer size
//      if (*inout == 'o') {
//        buf_size_in_bytes = buf_size_in_bytes * 4;
//      }
#endif