raspberrypi / linux

Kernel source tree for Raspberry Pi-provided kernel builds. Issues unrelated to the linux kernel should be posted on the community forum at https://forums.raspberrypi.com/
Other
11.02k stars 4.95k forks source link

cma_alloc: alloc failed #2680

Closed cbxbiker61 closed 5 years ago

cbxbiker61 commented 6 years ago

cma: cma_alloc: alloc failed, req-size: 1 pages, ret: -16

This problem started occurring in the raspberry pi kernels in approximately the 4.14.62+ time frame. It consistently occur on current 4.14.6x and 4.18.x pi kernels after varying intervals. It makes Kodi hang and also makes streaming from my OctoPrint boxes with pi-cams attached fail.

After looking back at the commits that had take place at that timeframe this commit jumped out: 00f0e834c44c492555e43fdaf9c112ed269db01f Revert "mm: alloc_contig: re-allow CMA to compact FS pages"

I re-implemented this patch and the problem is gone:

https://github.com/raspberrypi/linux/commit/424f6c4818bbf1b8ccf58aa012ecc19c0bb9b446 mm: alloc_contig: re-allow CMA to compact FS pages

popcornmix commented 6 years ago

See: https://github.com/raspberrypi/linux/pull/2647 for info.

Unfortunately there are problems with the identified commit removed too.

Do you have any simple way of provoking the failure you see?

HiassofT commented 6 years ago

I've observed this issue, too, when using LibreELEC on an RPi3B+. I haven't completed testing yet but the result of this issue is that Kodi locks up and restarting it isn't enough to make it work again, only a reboot works.

I posted preliminary test steps on LibreELEC Slack a couple of days, here are my notes so that the info doesn't get lost:

Base installation is LibreELEC master (8.90.004 should also do), in addition I had kernel 4.14.68 running (instead of 4.14.67 in 8.90.004)

install hyperion and pvr.hts. I configured pvr.hts to use a tvheadend server running on another RPi

Configure hyperion to use a "dummy" (file output) device. This is the configuration that I used http://ix.io/1meC

Disable screensaver in Kodi, as hyperion will stop when the screensaver kicks in.

Open the EPG in Kodi (TV->Guide) and let it idle there

Start a script to repeatedly download something over the net. I used this one to download a file from a computer on my LAN

#!/bin/sh
while /bin/true ; do
        wget -O /dev/null http://camel2.lan/test/LibreELEC-RPi2.arm-8.90.004.img.gz
done

After about 20 minutes I got a cma_alloc: alloc failed error in dmesg, after about 30 minutes I got to the RPi, pressed a button to move down in the EPG, then got 2 more errors and kodi locked up - and only rebooting worked.

dmesg: http://ix.io/1mfm journal: http://ix.io/1mfn kodi: http://ix.io/1mfo vcdbg log: http://ix.io/1mfp

I'm not familiar with the code involved but my guess is that the CMA heap gets fragmented (hyperion using the dispmanx grabber, other system stuff doing CMA allocations as well) and the compaction - which causes stutter in video playback - is needed to avoid more severe errors, the cma allocation failing which some other code doesn't seem to handle gracefully.

qdel commented 6 years ago

Found this error also with motion, using pi camera.

Happen in less than 5 minutes after boot. The kernel arrived in https://archive.raspberrypi.org/debian/pool/main/r/raspberrypi-firmware/ the 11 september (20180910). A simple upgrade of the pi makes it unusable.

Rolling back to previous kernel (20180817) is working correctly.

HiassofT commented 6 years ago

Just caught another cma_alloc error, same setup as above, but without the download script running and kodi was idling at the start screen. Again kodi locked up after trying to navigate in the main menu (pressed down/up 3 or 4 times)

le9:~ # dmesg | paste http://ix.io/1mC2

Checking /proc/meminfo I noticed CmaFree was down to 0 le9:~ # paste /proc/meminfo http://ix.io/1mC0

pelwell commented 6 years ago

A hybrid scheme whereby compactions are prevented while audio is playing might be possible, e.g. by adding disable_compaction/enable_compaction methods and calling them from within the sound drivers, forcing a compaction when the count enters and leaves the zero state. There are going to be pathological use cases where that doesn't work, but it might work for most.

admindotnu commented 6 years ago

Since new kernel; X is hanging and repeating some last movements over and over. Als points back to cma

[ 189.823000] vc4-drm soc:gpu: failed to allocate buffer with size 16777216 [ 309.825862] vc4-drm soc:gpu: failed to allocate buffer with size 16777216 [ 430.093573] vc4-drm soc:gpu: failed to allocate buffer with size 16777216 [ 430.104172] vc4-drm soc:gpu: failed to allocate buffer with size 16777216 [ 430.112971] vc4-drm soc:gpu: failed to allocate buffer with size 16777216 [ 430.113041] [drm:vc4_bo_create [vc4]] ERROR Failed to allocate from CMA: [ 430.113046] [drm] kernel: 8100kb BOs (1) [ 430.113049] [drm] V3D: 74752kb BOs (103) [ 430.113054] [drm] V3D shader: 216kb BOs (53) [ 430.113057] [drm] dumb: 48kb BOs (3) [ 430.113064] vc4_v3d 3fc00000.v3d: Failed to allocate memory for tile binning: -12. You may need to enable CMA or give it more memory.

HiassofT commented 6 years ago

I did some more tests:

Kodi beta 2 running on kernel 4.14.69, stock install, just playing movies from an USB drive (no hyperion / dispmanx grabbing or download in the background as before).

I then watched CmaFree: while true ; do grep CmaFree /proc/meminfo ; sleep 1 ; done

With one h264 movie CmaFree continually went down - from about 6700k in the beginning to 5960k after about 15 minutes. After about 20 minutes CmaFree dropped rapidly:

CmaFree:            5852 kB
CmaFree:            4000 kB
CmaFree:            5304 kB
CmaFree:            5072 kB
CmaFree:            4768 kB
CmaFree:            4760 kB
CmaFree:            4748 kB
CmaFree:            4152 kB
CmaFree:            4024 kB
CmaFree:            3896 kB
CmaFree:            3972 kB
CmaFree:            3660 kB
CmaFree:            3428 kB
CmaFree:            3152 kB
CmaFree:            3044 kB
CmaFree:            3024 kB
CmaFree:            2660 kB
CmaFree:            2508 kB
CmaFree:            2276 kB
CmaFree:            1864 kB
CmaFree:            1884 kB
CmaFree:            1488 kB
CmaFree:            1356 kB
CmaFree:            1184 kB
CmaFree:             940 kB
CmaFree:             692 kB
CmaFree:             460 kB
CmaFree:             336 kB
CmaFree:             180 kB
CmaFree:             264 kB
CmaFree:             240 kB
CmaFree:             260 kB
CmaFree:               0 kB
CmaFree:             260 kB
CmaFree:             264 kB
CmaFree:             248 kB
CmaFree:             264 kB
CmaFree:               0 kB
CmaFree:             260 kB
CmaFree:             232 kB
CmaFree:             260 kB
CmaFree:             256 kB
CmaFree:             252 kB
CmaFree:             256 kB
CmaFree:               0 kB
CmaFree:             256 kB

Edit: there were no "cma_alloc failed" messages in dmesg so far though.

After about 25 minutes the movie is still playing fine and CmaFree is circling around 100k.

With another h264 movie CmaFree was more stable, it did decrease but also increased. I haven't let it run long enough to see a clear trend but it certainly looked a lot better than the first movie.

Every now and then CmaFree shortly dropped down to 4096 in both cases but then went back to the previous value in the next second.

detule commented 6 years ago

I wonder if, in addition to reverting https://github.com/raspberrypi/linux/commit/424f6c4818bbf1b8ccf58aa012ecc19c0bb9b446 (already done), we would be better served by also reverting https://github.com/raspberrypi/linux/commit/73e64c51afc56d4863ae225e947ba2f16ad04487 . Otherwise we are exposed to the isolation failure described in the reverted commit.

HiassofT commented 6 years ago

@detule yes, I think both commits have to be reverted - I've just been looking at the kernel history as well.

Kernel 4.9 (up to stable 4.9.125) didn't contain either of the, kernel 4.10 contained both. By just reverting one commit we clearly in a state with unfixed issues.

Getting down to the root cause of the audio cracking / video stuttering under heavy load would be best, but until that's resolved reverting these two commits looks like a possible short-therm workaround

popcornmix commented 6 years ago

@detule that certainly looks like it's worth a try. @HiassofT 73e64c5 can be reverted cleanly - can you test that?

HiassofT commented 6 years ago

@popcornmix yes, I'll test that later, probably/ tomorrow

popcornmix commented 6 years ago

I've added the second revert to 4.18 and 4.19 to see if we can any positive or negative feedback from LibreELEC builds.

I'm trying to see if can reproduce the problem on raspbian with CmaFree script + wget download with kodi playing a video on loop (either with or without revert).

I'd be interested to hear if the second revert helps anyone else with the issue. (I can confirm that kodi still runs and it doesn't obviously break anything).

popcornmix commented 6 years ago

Without the second revert, after an hour on raspbian CmaFree has dropped from 7568k to 6968k (600k). So problem seems slower than @HiassofT was seeing, but does seem like it's decreasing. I'm not sure if it will drop faster at some point. A short test with second commit reverted didn't seem to be decreasing, but I'll need a longer test to be sure.

HiassofT commented 6 years ago

I've tested the problematic movie with 73e64c51afc56d4863ae225e947ba2f16ad04487 reverted for about 35 minutes now and quite interestingly CmaFree didn't change at all during playback - it was "stuck" at 6780 kB all the time.

I'll do another test tomorrow with the revert commit reverted (so CMA is in line with upstream) and watch CmaFree again.

Testing with other movies is also in my queue.

popcornmix commented 6 years ago

After nearly ten hours by raspbian test has dropped to 4952k (2616k lost). So something suspicious there but will probably take over a day to fatally fail.

Need to stop that test now - will return to test with second revert which I believe wasn't leaking.

HiassofT commented 6 years ago

I now tested with 00f0e834c44c492555e43fdaf9c112ed269db01f reverted, so CMA compaction was in line with upstream.

After kodi startup CmaFree was 6796 kB, when playing the problematic movie from SD card it stayed at 6796.

Then I copied a ~500MB file from SD card to the SD card. During this time CmaFree dropped down to 6744 kB with occasional minimums of 4096 kB. At the end of copying and stopping the movie it was at 6744.

Then I stopped kodi, ran sync and echo 3 > /proc/sys/vm/drop_caches and CmaFree was back at 6796 as in the beginning.

I then repeated this exact same test but instead of looking at CmaFree I checked for cma_alloc durations larger than 100ms with the function graph tracer:

cd /sys/kernel/debug/tracing/
echo cma_alloc > set_ftrace_filter
echo 100000 > tracing_thresh
echo 1 > options/funcgraph-proc
echo 1 > options/funcgraph-abstime
echo function_graph > current_tracer

With kernel 4.14.69 there were indeed no lines logged in /sys/kernel/debug/tracing/trace and I didn't hear any audio glitches.

With 00f0e834c44c492555e43fdaf9c112ed269db01f reverted (so CMA at upstream 4.14 code) I got a few audio dropouts during copying and a few trace lines:

# tracer: function_graph
#
#     TIME        CPU  TASK/PID         DURATION                  FUNCTION CALLS
#      |          |     |    |           |   |                     |   |   |   |
  160.055311 |   1)   AESink-381   | @ 651758.1 us |  } /* cma_alloc */
  163.235864 |   1)   AESink-381   | @ 231745.3 us |  } /* cma_alloc */
  175.476536 |   0)   AESink-381   | @ 144924.9 us |  } /* cma_alloc */
  187.484888 |   1)   AESink-381   | @ 761338.2 us |  } /* cma_alloc */

With 73e64c51afc56d4863ae225e947ba2f16ad04487 reverted (so CMA at about 4.9 code) I got a couple more audio dropouts and trace lines:

   78.841596 |   1)   AESink-381   | $ 3399967 us  |  } /* cma_alloc */
   82.781337 |   1)   AESink-381   | $ 1240439 us  |  } /* cma_alloc */
   88.550466 |   0)   AESink-381   | $ 2422724 us  |  } /* cma_alloc */
   96.547591 |   2)   AESink-381   | @ 833290.2 us |  } /* cma_alloc */
   99.932091 |   2)   AESink-381   | $ 2736295 us  |  } /* cma_alloc */
  104.753647 |   1)   AESink-381   | @ 425985.2 us |  } /* cma_alloc */
  111.402734 |   1)   AESink-381   | $ 1946940 us  |  } /* cma_alloc */
  119.329547 |   3)   AESink-381   | @ 630468.9 us |  } /* cma_alloc */
  133.017456 |   3)   AESink-381   | @ 383313.4 us |  } /* cma_alloc */

Then I traced cma_alloc and cma_release events (I only did this with rpi-4.14.69 kernel):

echo 1 > /sys/kernel/debug/tracing/events/cma/cma_alloc/enable
echo 1 > /sys/kernel/debug/tracing/events/cma/cma_release/enable

It showed quite a lot of cma_alloc calls from AESink (and immediate afterwards releases by the vchiq driver

          AESink-383   [001] ....    40.027171: cma_alloc: pfn=27517 page=a73e773c count=1 align=0
    vchiq-slot/0-80    [003] ....    40.027330: cma_release: pfn=27517 page=a73e773c count=1
          AESink-383   [001] ....    40.077576: cma_alloc: pfn=27517 page=a73e773c count=1 align=0
    vchiq-slot/0-80    [003] ....    40.077740: cma_release: pfn=27517 page=a73e773c count=1
          AESink-383   [001] ....    40.126857: cma_alloc: pfn=27517 page=a73e773c count=1 align=0
    vchiq-slot/0-80    [003] ....    40.127019: cma_release: pfn=27517 page=a73e773c count=1
          AESink-383   [001] ....    40.177213: cma_alloc: pfn=27517 page=a73e773c count=1 align=0
    vchiq-slot/0-80    [003] ....    40.177372: cma_release: pfn=27517 page=a73e773c count=1
          AESink-383   [001] ....    40.227459: cma_alloc: pfn=27517 page=a73e773c count=1 align=0
    vchiq-slot/0-80    [003] ....    40.227618: cma_release: pfn=27517 page=a73e773c count=1
          AESink-383   [001] ....    40.276738: cma_alloc: pfn=27517 page=a73e773c count=1 align=0
    vchiq-slot/0-80    [003] ....    40.276899: cma_release: pfn=27517 page=a73e773c count=1
          AESink-383   [001] ....    40.327019: cma_alloc: pfn=27517 page=a73e773c count=1 align=0
    vchiq-slot/0-80    [003] ....    40.327180: cma_release: pfn=27517 page=a73e773c count=1
          AESink-383   [001] ....    40.377362: cma_alloc: pfn=27517 page=a73e773c count=1 align=0
    vchiq-slot/0-80    [003] ....    40.377525: cma_release: pfn=27517 page=a73e773c count=1
          AESink-383   [001] ....    40.427688: cma_alloc: pfn=27517 page=a73e773c count=1 align=0
    vchiq-slot/0-80    [003] ....    40.427847: cma_release: pfn=27517 page=a73e773c count=1
          AESink-383   [001] ....    40.476939: cma_alloc: pfn=27517 page=a73e773c count=1 align=0
    vchiq-slot/0-80    [003] ....    40.477097: cma_release: pfn=27517 page=a73e773c count=1
          AESink-383   [001] ....    40.528117: cma_alloc: pfn=27517 page=a73e773c count=1 align=0
    vchiq-slot/0-80    [003] ....    40.528293: cma_release: pfn=27517 page=a73e773c count=1
          AESink-383   [001] ....    40.577371: cma_alloc: pfn=27517 page=a73e773c count=1 align=0
    vchiq-slot/0-80    [003] ....    40.577526: cma_release: pfn=27517 page=a73e773c count=1
          AESink-383   [001] ....    40.627699: cma_alloc: pfn=27517 page=a73e773c count=1 align=0

Full trace is here: http://ix.io/1mH6

It looks like the Pi audio driver puts quite a bit of pressure on the cma allocator - and is thus easily affected by latency issues - which can always occur, for various reasons.

Usually audio drivers pre-allocate buffers (for DMA audio data and the DMA descriptors, setup as a ring buffer) and then just use these during audio operation - so there's no constant alloc()/free/() happening. Probably a similar thing (maybe some preallocated buffer pool) should be done for the Pi audio driver

popcornmix commented 6 years ago

AESinkPi.cpp uses the openmax api to output audio (e.g. https://github.com/xbmc/xbmc/blob/master/xbmc/cores/AudioEngine/Sinks/AESinkPi.cpp#L485) openmax is a pretty thin layer on vchiq which is our general interface for communication with the GPU (also used by video decode, jpeg decode, opengles).

@pelwell will vchiq be dynamically allocating/freeing cma when transmitting/receiving messages?

HiassofT commented 6 years ago

I see the same alloc/release pattern when using OMX instead of MMAL:

  OMXPlayerVideo-463   [003] ....   247.327366: cma_alloc: pfn=27517 page=a73e773c count=1 align=0
    vchiq-slot/0-80    [003] ....   247.327495: cma_release: pfn=27517 page=a73e773c count=1
  OMXPlayerAudio-464   [001] ....   247.349250: cma_alloc: pfn=27517 page=a73e773c count=1 align=0
    vchiq-slot/0-80    [003] ....   247.349569: cma_release: pfn=27517 page=a73e773c count=1
  OMXPlayerVideo-463   [003] ....   247.408523: cma_alloc: pfn=27517 page=a73e773c count=1 align=0
    vchiq-slot/0-80    [003] ....   247.408662: cma_release: pfn=27517 page=a73e773c count=1
  OMXPlayerVideo-463   [003] ....   247.439211: cma_alloc: pfn=27517 page=a73e773c count=1 align=0
    vchiq-slot/0-80    [003] ....   247.439367: cma_release: pfn=27517 page=a73e773c count=1
  OMXPlayerVideo-463   [003] ....   247.439741: cma_alloc: pfn=27517 page=a73e773c count=1 align=0
    vchiq-slot/0-80    [003] ....   247.439887: cma_release: pfn=27517 page=a73e773c count=1
  OMXPlayerVideo-463   [003] ....   247.470409: cma_alloc: pfn=27517 page=a73e773c count=1 align=0
    vchiq-slot/0-80    [003] ....   247.470547: cma_release: pfn=27517 page=a73e773c count=1
  OMXPlayerAudio-464   [001] ....   247.523188: cma_alloc: pfn=27517 page=a73e773c count=1 align=0
    vchiq-slot/0-80    [003] ....   247.523517: cma_release: pfn=27517 page=a73e773c count=1
  OMXPlayerVideo-463   [003] ....   247.551505: cma_alloc: pfn=27517 page=a73e773c count=1 align=0
    vchiq-slot/0-80    [003] ....   247.551654: cma_release: pfn=27517 page=a73e773c count=1
  OMXPlayerVideo-463   [003] ....   247.551974: cma_alloc: pfn=27517 page=a73e773c count=1 align=0
    vchiq-slot/0-80    [003] ....   247.552092: cma_release: pfn=27517 page=a73e773c count=1
  OMXPlayerVideo-463   [003] ....   247.602821: cma_alloc: pfn=27517 page=a73e773c count=1 align=0
    vchiq-slot/0-80    [003] ....   247.602965: cma_release: pfn=27517 page=a73e773c count=1
  OMXPlayerVideo-463   [003] ....   247.643597: cma_alloc: pfn=27517 page=a73e773c count=1 align=0
    vchiq-slot/0-80    [003] ....   247.643758: cma_release: pfn=27517 page=a73e773c count=1
  OMXPlayerVideo-463   [003] ....   247.674323: cma_alloc: pfn=27517 page=a73e773c count=1 align=0
    vchiq-slot/0-80    [003] ....   247.674442: cma_release: pfn=27517 page=a73e773c count=1
  OMXPlayerAudio-464   [001] ....   247.686968: cma_alloc: pfn=27517 page=a73e773c count=1 align=0
    vchiq-slot/0-80    [003] ....   247.687298: cma_release: pfn=27517 page=a73e773c count=1
  OMXPlayerVideo-463   [003] ....   247.755475: cma_alloc: pfn=27517 page=a73e773c count=1 align=0
    vchiq-slot/0-80    [003] ....   247.755617: cma_release: pfn=27517 page=a73e773c count=1
  OMXPlayerVideo-463   [003] ....   247.755941: cma_alloc: pfn=27517 page=a73e773c count=1 align=0
    vchiq-slot/0-80    [003] ....   247.756070: cma_release: pfn=27517 page=a73e773c count=1
  OMXPlayerVideo-463   [003] ....   247.837015: cma_alloc: pfn=27517 page=a73e773c count=1 align=0
    vchiq-slot/0-80    [003] ....   247.837160: cma_release: pfn=27517 page=a73e773c count=1
  OMXPlayerVideo-463   [003] ....   247.837472: cma_alloc: pfn=27517 page=a73e773c count=1 align=0
    vchiq-slot/0-80    [003] ....   247.837573: cma_release: pfn=27517 page=a73e773c count=1
  OMXPlayerAudio-464   [001] ....   247.860720: cma_alloc: pfn=27517 page=a73e773c count=1 align=0
    vchiq-slot/0-80    [003] ....   247.861041: cma_release: pfn=27517 page=a73e773c count=1
  OMXPlayerVideo-463   [003] ....   247.918608: cma_alloc: pfn=27517 page=a73e773c count=1 align=0
    vchiq-slot/0-80    [003] ....   247.918752: cma_release: pfn=27517 page=a73e773c count=1
  OMXPlayerVideo-463   [003] ....   247.929154: cma_alloc: pfn=27517 page=a73e773c count=1 align=0
    vchiq-slot/0-80    [003] ....   247.929284: cma_release: pfn=27517 page=a73e773c count=1
  OMXPlayerVideo-463   [003] ....   247.959829: cma_alloc: pfn=27517 page=a73e773c count=1 align=0

One interesting difference though is that the interval between audio packets is longer (~200 instead of ~50ms) than when using MMAL / AESinkPi

popcornmix commented 6 years ago

Does running something like vchiq_test -p trigger anything?

pelwell commented 6 years ago

Each bulk transfer will allocate a small block to hold the scatter gather list until the transfer competes, but the other allocations should be static at the start of day.

HiassofT commented 6 years ago

yes, vchiq_test -q -p and vchiq_test -q -b 42 also show that pattern

      vchiq_test-535   [001] ....  2564.076533: cma_alloc: pfn=27517 page=a73e773c count=1 align=0
      vchiq_test-535   [001] ....  2564.076845: cma_alloc: pfn=27518 page=a73e7760 count=1 align=0
      vchiq_test-535   [001] ....  2564.077082: cma_alloc: pfn=27519 page=a73e7784 count=1 align=0
    vchiq-slot/0-80    [000] ....  2564.077249: cma_release: pfn=27517 page=a73e773c count=1
    vchiq-slot/0-80    [000] ....  2564.077355: cma_release: pfn=27518 page=a73e7760 count=1
      vchiq_test-535   [001] ....  2564.077599: cma_alloc: pfn=27517 page=a73e773c count=1 align=0
    vchiq-slot/0-80    [000] ....  2564.077774: cma_release: pfn=27519 page=a73e7784 count=1
    vchiq-slot/0-80    [000] ....  2564.078002: cma_release: pfn=27517 page=a73e773c count=1
 VCHIQ completio-536   [002] ....  2564.078083: cma_alloc: pfn=27517 page=a73e773c count=1 align=0
 VCHIQ completio-536   [002] ....  2564.078408: cma_alloc: pfn=27518 page=a73e7760 count=1 align=0
 VCHIQ completio-536   [002] ....  2564.078505: cma_alloc: pfn=27519 page=a73e7784 count=1 align=0
    vchiq-slot/0-80    [000] ....  2564.078586: cma_release: pfn=27517 page=a73e773c count=1
 VCHIQ completio-536   [002] ....  2564.078938: cma_alloc: pfn=27517 page=a73e773c count=1 align=0
 VCHIQ completio-536   [002] ....  2564.079039: cma_alloc: pfn=2751a page=a73e77a8 count=1 align=0
    vchiq-slot/0-80    [000] ....  2564.079134: cma_release: pfn=27518 page=a73e7760 count=1
    vchiq-slot/0-80    [000] ....  2564.079256: cma_release: pfn=27519 page=a73e7784 count=1
 VCHIQ completio-536   [002] ....  2564.079480: cma_alloc: pfn=27518 page=a73e7760 count=1 align=0
    vchiq-slot/0-80    [000] ....  2564.079597: cma_release: pfn=27517 page=a73e773c count=1
 VCHIQ completio-536   [002] ....  2564.079796: cma_alloc: pfn=27517 page=a73e773c count=1 align=0
 VCHIQ completio-536   [002] ....  2564.079888: cma_alloc: pfn=27519 page=a73e7784 count=1 align=0
    vchiq-slot/0-80    [000] ....  2564.079974: cma_release: pfn=2751a page=a73e77a8 count=1
    vchiq-slot/0-80    [000] ....  2564.080026: cma_release: pfn=27518 page=a73e7760 count=1
 VCHIQ completio-536   [002] ....  2564.080239: cma_alloc: pfn=27518 page=a73e7760 count=1 align=0
 VCHIQ completio-536   [002] ....  2564.080330: cma_alloc: pfn=2751a page=a73e77a8 count=1 align=0
    vchiq-slot/0-80    [000] ....  2564.080413: cma_release: pfn=27517 page=a73e773c count=1
    vchiq-slot/0-80    [000] ....  2564.080472: cma_release: pfn=27519 page=a73e7784 count=1
 VCHIQ completio-536   [002] ....  2564.080685: cma_alloc: pfn=27517 page=a73e773c count=1 align=0
 VCHIQ completio-536   [002] ....  2564.080776: cma_alloc: pfn=27519 page=a73e7784 count=1 align=0
    vchiq-slot/0-80    [000] ....  2564.080859: cma_release: pfn=27518 page=a73e7760 count=1
    vchiq-slot/0-80    [000] ....  2564.080911: cma_release: pfn=2751a page=a73e77a8 count=1
 VCHIQ completio-536   [002] ....  2564.081160: cma_alloc: pfn=27518 page=a73e7760 count=1 align=0
 VCHIQ completio-536   [002] ....  2564.081254: cma_alloc: pfn=2751a page=a73e77a8 count=1 align=0
    vchiq-slot/0-80    [000] ....  2564.081337: cma_release: pfn=27517 page=a73e773c count=1
    vchiq-slot/0-80    [000] ....  2564.081390: cma_release: pfn=27519 page=a73e7784 count=1
 VCHIQ completio-536   [002] ....  2564.081617: cma_alloc: pfn=27517 page=a73e773c count=1 align=0
 VCHIQ completio-536   [002] ....  2564.081844: cma_alloc: pfn=27519 page=a73e7784 count=1 align=0
    vchiq-slot/0-80    [000] ....  2564.081950: cma_release: pfn=27518 page=a73e7760 count=1
    vchiq-slot/0-80    [000] ....  2564.082002: cma_release: pfn=2751a page=a73e77a8 count=1
    vchiq-slot/0-80    [000] ....  2564.082030: cma_release: pfn=27517 page=a73e773c count=1
    vchiq-slot/0-80    [000] ....  2564.082232: cma_release: pfn=27519 page=a73e7784 count=1
...
HiassofT commented 6 years ago

Coming back to the original issue:: I think it's best to reinstate the CMA code and revert 00f0e834c44c492555e43fdaf9c112ed269db01f until a better solution to resolve audio stuttering under high load is found. CMA memory draining and processes locking up is certainly more severe

pelwell commented 6 years ago

I agree.

popcornmix commented 6 years ago

I've reverted 00f0e834c44c492555e43fdaf9c112ed269db01f for now, and it's in latest rpi-update kernel.

popcornmix commented 6 years ago

@pelwell for the audio/video stalls during cma compaction, would it be possible for vchiq to keep a pool of buffers, so cma allocations are only required on startup?

pelwell commented 6 years ago

I'd had that thought. Theoretically the allocations can be larger than one page, but in practise they usually don't, so we can optimise the easy case.

lategoodbye commented 6 years ago

Before any changes on bcm2835-audio, i wanted to ask if a backport of recent upstream changes would be acceptable?

popcornmix commented 6 years ago

I had a brief look through the bcm2835-audio patches and they look good. Backporting them to existing kernels would be fine.

detule commented 5 years ago

Given the report that with compaction disabled there is no audio stutter, I wonder if setting the GFP_NOIO flag for the vchiq allocation requests might be a short-term win.

popcornmix commented 5 years ago

@pelwell was https://github.com/raspberrypi/linux/blob/rpi-4.14.y/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_2835_arm.c#L420 the location you were thinking of?

pelwell commented 5 years ago

Yes - that's the one.

popcornmix commented 5 years ago

Is replacing GFP_KERNEL with GFP_NOIO there worth a try?

pelwell commented 5 years ago

Is that going to make any difference at all on a no-swap scenario?

I thought that the problem is that these small, short-lived allocations are resulting in fragmentation, rather than the allocations themselves taking too long. The NOIO flag should result in less blocking, not more, leading to more chance that the allocation might fail without any obvious benefit.

popcornmix commented 5 years ago

I'm assuming that cma compaction gets (sometimes) triggered by the cma allocation, and when that happens on a vchiq operation it breaks video/audio playback. I guessed from @detule's comment that GFP_NOIO may have a "no compact" side effect, but I haven't confirmed that. Sure, the compaction will still occur on a different cma allocation but it may not interfere with the vchiq threads (although I suspect there may well be locks that means once the cma compaction starts, no one else can get in so there may be no real benefit).

I think the vchiq pool is a better solution. I'd also like to know why cma compaction appears to be so slow (for a fairly small cma size). @HiassofT's trace suggests durations of hundreds of millisecond to multiple seconds if I'm interpreting them correctly).

HiassofT commented 5 years ago

FYI: here's what' happening during one of the long cma_alloc calls: http://ix.io/1mO3

I can try a quick test with NOIO but IMO it'd be better to switch the vchiq driver to using a mempool

popcornmix commented 5 years ago

If GFP_NOIO doesn't have an effect I guess GFP_ATOMIC is even more likely to avoid the delay (but with a higher chance of an actual allocation failure).

pelwell commented 5 years ago

Maybe try that first.

HiassofT commented 5 years ago

GFP_NOIO looks good. cma_alloc times didn't change when I copied the ~500MB file from SD to SD. maximum was one 10ms time, the majority of the allocations were below 1ms (used that as tracing_thresh)

  538.040233 |   0)   AESink-382   | # 1386.510 us |  } /* cma_alloc */
  538.190027 |   0)   AESink-382   | # 1390.261 us |  } /* cma_alloc */
  538.339706 |   0)   AESink-382   | # 1400.677 us |  } /* cma_alloc */
  538.441158 |   0)   AESink-382   | # 1361.875 us |  } /* cma_alloc */
  538.490654 |   0)   AESink-382   | # 1434.948 us |  } /* cma_alloc */
  538.590108 |   0)   AESink-382   | # 1352.187 us |  } /* cma_alloc */
  538.639659 |   0)   AESink-382   | # 1422.708 us |  } /* cma_alloc */
  538.740301 |   0)   AESink-382   | # 1507.188 us |  } /* cma_alloc */
  538.790088 |   0)   AESink-382   | # 1389.635 us |  } /* cma_alloc */
  538.939707 |   0)   AESink-382   | # 1404.688 us |  } /* cma_alloc */
  538.990545 |   0)   AESink-382   | # 1804.427 us |  } /* cma_alloc */
  539.040037 |   0)   AESink-382   | # 1442.812 us |  } /* cma_alloc */
  543.540342 |   3)   AESink-382   | # 1696.250 us |  } /* cma_alloc */
  545.990838 |   3)   AESink-382   | # 1500.105 us |  } /* cma_alloc */
  546.140231 |   3)   AESink-382   | # 1728.177 us |  } /* cma_alloc */
  549.040270 |   3)   AESink-382   | # 1627.656 us |  } /* cma_alloc */
  549.140657 |   3)   AESink-382   | # 2017.292 us |  } /* cma_alloc */
  550.089644 |   3)   AESink-382   | # 1654.010 us |  } /* cma_alloc */
  551.289775 |   3)   AESink-382   | # 1211.667 us |  } /* cma_alloc */
  551.489020 |   3)   AESink-382   | # 1076.354 us |  } /* cma_alloc */
  555.789587 |   0)   AESink-382   | # 1029.063 us |  } /* cma_alloc */
  557.040300 |   0)   AESink-382   | # 2095.834 us |  } /* cma_alloc */
  557.091113 |   3)   AESink-382   | # 3129.948 us |  } /* cma_alloc */
  557.498781 |   3)   AESink-382   | * 10154.79 us |  } /* cma_alloc */
  560.390885 |   3)   AESink-382   | # 2705.625 us |  } /* cma_alloc */
  560.491811 |   0)   AESink-382   | # 2353.125 us |  } /* cma_alloc */
  560.690241 |   0)   AESink-382   | # 1847.084 us |  } /* cma_alloc */
  560.739261 |   0)   AESink-382   | # 1333.177 us |  } /* cma_alloc */
  560.789439 |   0)   AESink-382   | # 1408.177 us |  } /* cma_alloc */
  560.889322 |   0)   AESink-382   | # 1262.969 us |  } /* cma_alloc */
  560.939412 |   0)   AESink-382   | # 1427.969 us |  } /* cma_alloc */
  560.989248 |   0)   AESink-382   | # 1213.907 us |  } /* cma_alloc */
  564.890360 |   3)   AESink-382   | # 1464.688 us |  } /* cma_alloc */
  575.239216 |   1)   AESink-382   | # 1472.292 us |  } /* cma_alloc */
  575.489683 |   0)   AESink-382   | # 1011.927 us |  } /* cma_alloc */
  576.839921 |   0)   AESink-382   | # 1121.042 us |  } /* cma_alloc */
  577.040490 |   0)   AESink-382   | # 1635.417 us |  } /* cma_alloc */
  578.438618 |   0)   AESink-382   | # 1023.177 us |  } /* cma_alloc */
  578.539422 |   0)   AESink-382   | # 1176.823 us |  } /* cma_alloc */
  578.639071 |   0)   AESink-382   | # 1000.521 us |  } /* cma_alloc */
  578.689682 |   0)   AESink-382   | # 1107.448 us |  } /* cma_alloc */
  578.739239 |   0)   AESink-382   | # 1056.771 us |  } /* cma_alloc */
  578.839405 |   0)   AESink-382   | # 1508.125 us |  } /* cma_alloc */
  578.889227 |   0)   AESink-382   | # 1083.802 us |  } /* cma_alloc */
  578.988849 |   0)   AESink-382   | # 1201.979 us |  } /* cma_alloc */
  580.040766 |   0)   AESink-382   | # 2815.625 us |  } /* cma_alloc */
  580.389327 |   0)   AESink-382   | # 1104.791 us |  } /* cma_alloc */
  583.340106 |   0)   AESink-382   | # 1388.229 us |  } /* cma_alloc */
  583.389561 |   0)   AESink-382   | # 1072.969 us |  } /* cma_alloc */
  583.589547 |   2)   AESink-382   | # 1290.417 us |  } /* cma_alloc */
  583.989390 |   2)   AESink-382   | # 1070.052 us |  } /* cma_alloc */
  584.390930 |   2)   AESink-382   | # 2917.136 us |  } /* cma_alloc */
  585.540440 |   2)   AESink-382   | # 1663.229 us |  } /* cma_alloc */
  595.539797 |   0)   AESink-382   | # 1844.636 us |  } /* cma_alloc */
  595.840136 |   0)   AESink-382   | # 1818.125 us |  } /* cma_alloc */
  596.289287 |   0)   AESink-382   | # 1060.625 us |  } /* cma_alloc */
  596.538944 |   0)   AESink-382   | # 1022.761 us |  } /* cma_alloc */
  596.638662 |   0)   AESink-382   | # 1052.292 us |  } /* cma_alloc */
  603.139033 |   3)   AESink-382   | # 1185.208 us |  } /* cma_alloc */
  603.889355 |   3)   AESink-382   | # 1075.365 us |  } /* cma_alloc */
  604.239540 |   3)   AESink-382   | # 1112.813 us |  } /* cma_alloc */
  604.590322 |   3)   AESink-382   | # 1369.218 us |  } /* cma_alloc */
popcornmix commented 5 years ago

@HiassofT thanks. That's interesting. I suspect the vchiq pool is the better solution, but this could provide a "band-aid" if needed (I'm still wondering what happens when the cma compaction occurs through another route - does that still block vchiq?).

lategoodbye commented 5 years ago

I created a branch with all bcm2835-audio upstream changes (40 patches). After some testing i would create a pull request.

smp79 commented 5 years ago

GFP_NOIO fixed the audio stuttering for me.

pcollinson commented 5 years ago

I, too, am getting the cma: cma_alloc: alloc failed, req-size: 2 pages, ret: -16 error on my stretch 3B+ machine running motion with the pi camera. It's generating h264 files all day when it detects motion. I am wondering of the Kodi fix above fixes this, and roughly how long I will have to wait until an official kernel update happens. Sorry if this is the wrong place to be asking dumb questions.

popcornmix commented 5 years ago

@pcollinson The kodi fix is unrelated to your issue. The revert of 00f0e834c44c492555e43fdaf9c112ed269db01f may fix your issue. You can use rpi-update to get the testing kernel now, or you can wait for it to reach apt (probably a few days).

pcollinson commented 5 years ago

Many thanks for the response. I'll wait.

pcollinson commented 5 years ago

I installed a dead man's handle that spotted the error message in dmesg and rebooted. This machine has a camera running 24/7 and uses USB drives as disks. Today's replacement kernel seems to have cured things - and also made a profound change in the values of CmaFree which I snapped every 5 mins. If this is interesting I can upload the log.

pelwell commented 5 years ago

How about uploading the log and saying what the change is? Will we need to see a "before" log to fully understand the change?

pcollinson commented 5 years ago

Ok here's the log - basically the first few lines are 'getting' it going - then you can see the machine rebooting itself. I've added in the point where I installed the new kernel this morning. The noticeable thing is how the CMFree numbers stabilise after the new kernel is installed. So it looks to me that something was losing memory before. The camera was working throughout all this. log.txt

pelwell commented 5 years ago

Yep - that's quite an improvement.

cbxbiker61 commented 5 years ago

Did the last fix get pulled into the 4.18.x branch? I tested that a couple of days ago and get the cma allocation error. On the other hand, I built 4.14.x today and it seems to be working so far.

popcornmix commented 5 years ago

I believe 4.18 is just using the upstream code (no reverts) with the GFP_NOIO flag added to vchiq allocations. That should match 4.14 code and isn't expected to have the cma_alloc failures.