raspberrypi / firmware

This repository contains pre-compiled binaries of the current Raspberry Pi kernel and modules, userspace libraries, and bootloader/GPU firmware.
5.14k stars 1.68k forks source link

vc_dispmanx_update_submit_sync hangs forever after being called exactly 128 times on RPi 4 #1182

Open ali1234 opened 5 years ago

ali1234 commented 5 years ago

On being called for the 128th time inside the vsync callback, vc_dispmanx_update_submit_sync never returns. It works fine up until this point. This seems to be 100% reproducible on RPi 4 and it always hangs after exactly 128 calls. I strongly suspect that some queue inside the firmware is being filled to capacity.

Code: https://github.com/ali1234/raspi-teletext/blob/master/teletext.c#L66

This software has been widely deployed on earlier RPi hardware and works fine there.

Note that you don't need to use the tvctl utility (not yet ported) or have a CRT TV to reproduce this.

Running Raspbian Buster which i just upgraded. Firmware version:

Jun 20 2019 16:06:32 
Copyright (c) 2012 Broadcom
version 407b1da8fa3d1a7108cb1d250f5064a3420d2b7d (clean) (release) (start_x)
ali1234 commented 5 years ago

I am getting reports that vsync is also broken on previous models. See https://github.com/ali1234/raspi-teletext/issues/22

Not sure if it is the same problem yet.

popcornmix commented 5 years ago

Calling a vchiq function from a vchiq callback is a recipe for deadlock and not supported. You need to signal a thread from the vsync callback and submit the new update from there.

ali1234 commented 5 years ago

This solves the problem for me. Does it look okay?

https://github.com/ali1234/raspi-teletext/commit/b8695eca93226f3c149cbf499262f0131f79f722

popcornmix commented 5 years ago

Yes, that looks fine.

ali1234 commented 5 years ago

Thanks!

ali1234 commented 5 years ago

After testing the theaded code I have found the real problem: the behaviour of vc_dispmanx_update_submit_sync has changed and it now does not return until after the next vsync. In previous firmware versions it would return immediately. The result of this is that if you call it you are guaranteed to drop a frame. If you call it from the vsync callback then the next callback is queued, leading to the repeatable 128 frame deadlock. If you call if from a thread then the next callback is simply ignored, meaning you are guaranteed to only be able to update the screen at most every other frame.

I need to be able to update the screen exactly once per frame in a way that is compatible with past and future firmwares.

popcornmix commented 5 years ago

vc_dispmanx_update_submit will return immediately vc_dispmanx_update_submit_sync will wait for next vsync.

ali1234 commented 5 years ago

Yes, the difference is that vc_dispmanx_update_submit_sync used to return after the next vsync but before the next call to the vsync callback. Now it returns after the next vsync callback - sometimes an entire frame later.

In other words the old version would never call the vsync callback during a call to update_sync. The new version can call the vsync callback as often as twice during the same call to update_sync.

popcornmix commented 5 years ago

There was a firmware fix for late vsync callbacks on June 23. Looks like your firmware is just too old. Can you try updating?

ali1234 commented 5 years ago

I'm on the latest apt-get firmware. I'll try rpi-update.

ali1234 commented 5 years ago

rpi-update firmware has the same behaviour.

I am able to reliably hit the frame deadlines by doing this: https://github.com/ali1234/raspi-teletext/commit/d269857a2a0ddaadf86ba37ab25e5da34fd12042

I need to do some more testing to see if it is still dropping frames.

ali1234 commented 5 years ago

Even this drops frames, even though I'm meeting every deadline and submitting the updates as soon as possible.

I am sending alternating black and white frames so that frame drops are very easy to see. And I have added a callback parameter to the submit_update call. I'm seeing exactly the same thing as I did with submit_update_sync:

  1. The vsync callback is called. It prints "S" and then sends the thread signal.
  2. The thread receives the signal and prints "A", then calls submit_update. Then prints "B\n".
  3. The submit_update callback prints "U" and then returns.

And I see output like:

SUAB
SUAB
SUAB
SAB
SUAUB
SUAB

Note that every S is followed by an AB, meaning I'm meeting the deadline - submit_update always returns before the next vsync callback arrives. But the update callbacks aren't coming in until two vsyncs later. Whenever this happens I also see a framedrop on-screen. This happens a lot, roughly every 10 frames on average.

Sending an update and then receiving two vsyncs before it actually completes is exactly the same problem as I had with submit_update_sync.

I also tried removing the printfs incase that was causing blocking when redrawing the terminal, and I still see the same frame drops.

ali1234 commented 5 years ago

I've made a stripped down version of my test code (warning, it strobes the display):

#include <stdio.h>
#include <stdlib.h>
#include <stdarg.h>
#include <time.h>
#include <assert.h>
#include <unistd.h>
#include <sys/time.h>
#include <pthread.h>

#include "bcm_host.h"

#ifndef ALIGN_UP
#define ALIGN_UP(x,y)  ((x + (y)-1) & ~((y)-1))
#endif

static DISPMANX_DISPLAY_HANDLE_T   display;
static DISPMANX_ELEMENT_HANDLE_T   element;
static DISPMANX_RESOURCE_HANDLE_T  resource[2];
static int next_resource = 0;

static unsigned short palette[256] = { 0x0, 0xffff, 0xf000 };

uint8_t *image;
VC_RECT_T image_rect;
#define TYPE (VC_IMAGE_8BPP)
#define WIDTH (32)
#define HEIGHT (32)
#define OFFSET (8)
#define PITCH (ALIGN_UP(WIDTH, 32))
#define ROW(n) (image+(PITCH*(n))+OFFSET)

pthread_cond_t cond1 = PTHREAD_COND_INITIALIZER;
pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER;

void update_callback(DISPMANX_UPDATE_HANDLE_T u, void* arg)
{
    printf("U");
}

void vsync(void)
{
    DISPMANX_UPDATE_HANDLE_T    update;

    printf("A");
    update = vc_dispmanx_update_start( 10 );
    printf("B");

    vc_dispmanx_element_change_source( update, element, resource[next_resource]);

    pthread_mutex_lock(&lock);
    pthread_cond_wait(&cond1, &lock);
    printf("C");
    vc_dispmanx_update_submit( update, update_callback, NULL );
    printf("D\n");
    pthread_mutex_unlock(&lock);

    next_resource = next_resource ^1;
    // fill image
    memset(image, next_resource, WIDTH*HEIGHT);
    // write to resource
    vc_dispmanx_resource_write_data(  resource[next_resource], TYPE, PITCH, image, &image_rect );
}

void vsync_callback(DISPMANX_UPDATE_HANDLE_T u, void* arg)
{
    printf("S");
    pthread_cond_signal(&cond1);
}

void *vsync_thread()
{
    while(1) vsync();
}

int main(int argc, char *argv[])
{
    VC_RECT_T       src_rect;
    VC_RECT_T       dst_rect;

    DISPMANX_UPDATE_HANDLE_T    update;
    uint32_t                    vc_image_ptr;

    bcm_host_init();

    display = vc_dispmanx_display_open( 0 );

    image = calloc( 1, PITCH * HEIGHT ); // buffer 0

    // set up some resources
    vc_dispmanx_rect_set( &image_rect, 0, 0, WIDTH, HEIGHT);
    for (int n=0;n<2;n++) {
        resource[n] = vc_dispmanx_resource_create( TYPE, WIDTH, HEIGHT, &vc_image_ptr );
        vc_dispmanx_resource_set_palette(  resource[n], palette, 0, sizeof palette );
        vc_dispmanx_resource_write_data(  resource[n], TYPE, PITCH, image, &image_rect );
    }

    update = vc_dispmanx_update_start( 10 );
    vc_dispmanx_rect_set( &src_rect, 0, 0, WIDTH << 16, HEIGHT << 16 );
    vc_dispmanx_rect_set( &dst_rect, 0, 0, WIDTH, HEIGHT );
    element = vc_dispmanx_element_add( update, display, 2000,
                                       &dst_rect, resource[0], &src_rect,
                                       DISPMANX_PROTECTION_NONE,
                                       NULL, NULL, VC_IMAGE_ROT0 );

    vc_dispmanx_update_submit_sync( update );

    pthread_t tid1;
    pthread_create(&tid1, NULL, vsync_thread, NULL);

    vc_dispmanx_vsync_callback(display, NULL, NULL);
    vc_dispmanx_vsync_callback(display, vsync_callback, NULL);

    while(1) {
        ;
    }
    return 0;
}

On a Pi 4 it outputs like this:

ABSUCD
ABSUCD
ABSCD
ABSUUCD
ABSUCD
ABSUCD

and visibly drops a frame every time this happens.

On a Pi 3 it only outputs ABSUCD as expected, and never drops frames.

This was tested with the same SD card, which was apt updated and rpi-update updated yesterday.

I have removed all the assert checking for clarity. It has no effect on the bug.

ali1234 commented 5 years ago

I just read this on https://github.com/raspberrypi/firmware/issues/1154

"Due to a quirk in the way the hardware was behaving, the loading of the display list into the live HVS registers moved from an interrupt approx 32 lines before the end of the frame, to the start of frame. This means that your dispmanx_update_submit_sync has almost certainly missed the slot for the current frame as you only get the vblank interval."

So I added a usleep(3000); between the signal wait and the update submit. This is about 58 lines, so longer than the vblank, and guarantees that I always miss the current frame. Since the update doesn't block I can still send one after the next vsync that will be guaranteed to miss the next frame. Everything is therefore guaranteed to be drawn exactly one frame late. And this fixes the frame drops on pi 4. I have not yet tested how this behaves on pi 3.

ali1234 commented 5 years ago

In brief testing this seems to work okay on pi 3 as well. Note that drawing one frame late is acceptable for my use case. The only thing I need is to show every frame exactly once.

6by9 commented 5 years ago

Loading the HVS display list at the start of frame resulted in increased display latency and using dispmanx_update_submit_sync would only get half the display framerate. It's now being loaded off a timer approx 2ms (IIRC) before the vsync. It may be possible to pull it closer to the vsync, but that was the initial, slightly conservative, value that meant dispmanx_update_submit_sync would run at full speed again.

ali1234 commented 5 years ago

I am currently using these firmwares:

Pi 3: Aug 21 2019 12:10:49 Copyright (c) 2012 Broadcom version 0e6daa5106dd4164474616408e0dc24f997ffcf3 (clean) (release) (start_x)

Pi 4: Aug 21 2019 12:07:57 Copyright (c) 2012 Broadcom version efc327d047b55ae0420ba2bd7673f5a952cd8920 (clean) (release) (start_x)

If the HVS is loaded 2ms before the vsync, then that means it is loaded 14.6ms after the previous vsync.

My userspace thread calls submit immediately after it wakes, and misses this deadline about 10% of the time. Adding a 100 usec delay causes it to miss the deadline about 50% of the time. Adding a 500usec extra delay causes it to miss the deadline 100% of the time, which stabilizes the framerate. Then at 14000 usec delay it becomes unstable again as it is now missing two deadlines. And so on.

I can only see two possible explanations for this:

  1. The firmware RPC + thread wait mechanism has a round-trip latency of exactly 14.5ms +/- 0.5ms. This would mean it is impossible to ever achieve reliable 60FPS with submit_sync.

  2. The firmware I am using is not fixed and is still loading the HVS almost immediately after sending the vsync.

What do you think?

ali1234 commented 5 years ago

Today I connected a scope to the composite video out and asserted a gpio from the vsync callback. There are the results, using firmware that I updated today.

First Pi 4:

DS1Z_QuickPrint2

Here you can see that the vsync callback is called approximately 1ms before the real vsync. It is possible to submit an update in this time if you are very fast, but notice that the vsync occasionally arrives late, as much as 4ms late, which pushes it well in to the next frame and guarantees a frame drop.

And Pi 3:

DS1Z_QuickPrint3

Here the result is mostly the same: vsync callback arrives 1ms before the real vsync. But notice that although the vsync callback is sometimes late, it does not happen as often, and when it does it is only about 1ms late. This is still enough to cause a frame drop, but it happens less often, which is exactly what I've observed previously.

So overall I would expect the vsync callback to be called immediately after the real vsync, to give the maximum possible time for userspace to submit update. But instead it seems to get called at the last possible time you can submit an update for the current frame, meaning it is nearly but not quite impossible to ever get a stable frame rate without intentionally delaying updates so that they always go to the next frame.

This means if you use submit_sync you have a choice between stuttery video, or only being able to achieve a stable 50% of the real framerate.

The Pi 3 firmware version I tested was:

Aug 22 2019 14:00:42 
Copyright (c) 2012 Broadcom
version 0e6daa5106dd4164474616408e0dc24f997ffcf3 (clean) (release) (start_x)

Unfortunately I forgot to write down the Pi 4 version, but it came from the same rpi-update.

ali1234 commented 5 years ago

Forgot to mention:

The scope traces are from top to bottom: composite video, GPIO 5, 6, 13. The scope is set to trigger on the vsync of the odd field which is positioned in the middle of the scope. The horizontal scale shows about half of a field (12ms).

And here is the source code of the program I used to generate the test:

#include <assert.h>
#include <pthread.h>
#include <wiringPi.h>

#include "bcm_host.h"

typedef struct render_shared {
    DISPMANX_DISPLAY_HANDLE_T   display;

    pthread_cond_t cond;
    pthread_mutex_t lock;
    pthread_t thread;
    int done;

} render_shared;

void *render_thread_func(void *anon_render_shared)
{
    render_shared *r = anon_render_shared;

    while(!r->done) {
        pthread_cond_wait(&r->cond, &r->lock);
        digitalWrite (13, 1);
        usleep(500);
        digitalWrite (13, 0);
    }

    return NULL;
}

void vsync_callback(DISPMANX_UPDATE_HANDLE_T u, void *anon_render_shared)
{
    render_shared *r = anon_render_shared;
    digitalWrite (5, 1);
    pthread_cond_signal(&r->cond);
    digitalWrite (6, 1);
    usleep(250);
    digitalWrite (5, 0);
    digitalWrite (6, 0);
}

void *render_start(void)
{
    // Build the render struct. Must malloc it so it is still valid
    // after this function returns.
    render_shared *r = (render_shared *)calloc(1, sizeof(render_shared));

    pthread_cond_init(&r->cond, NULL);
    pthread_mutex_init(&r->lock, NULL);

    r->done = 0;

    bcm_host_init();
    r->display = vc_dispmanx_display_open(0);

    pthread_create(&r->thread, NULL, render_thread_func, r);

    // BUG: Clear any existing callbacks, even to other apps.
    // https://github.com/raspberrypi/userland/issues/218
    // TODO: Check if we still need this.
    vc_dispmanx_vsync_callback(r->display, NULL, NULL);

    // Set the callback function.
    vc_dispmanx_vsync_callback(r->display, vsync_callback, r);

    return r;
}

void render_stop(void *anon_render_shared)
{
    render_shared *r = anon_render_shared;

    // Stop the thread first in case it is waiting on a signal.
    r->done = 1;
    pthread_join(r->thread, NULL);

    // Stop the vsync callbacks.
    vc_dispmanx_vsync_callback(r->display, NULL, NULL);

    pthread_cond_destroy(&r->cond);
    pthread_mutex_destroy(&r->lock);

    free(r);
}

int main(int argc, char *argv[])
{
    if (wiringPiSetupGpio() == -1)
        return 1 ;

    pinMode (5, OUTPUT);
    pinMode (6, OUTPUT);
    pinMode (13, OUTPUT);

    void *x = render_start();

    while(1) sleep(1);

    render_stop(x);
}
ArtArt0 commented 4 years ago

Any updates to achieve RPi4 60Hz refresh rate exactly?