stevenlovegrove / Pangolin

Pangolin is a lightweight portable rapid development library for managing OpenGL display / interaction and abstracting video input.
MIT License
2.35k stars 849 forks source link

Colour and Depth buffer reading time #432

Open christian-rauch opened 5 years ago

christian-rauch commented 5 years ago

I am reading from the colour and the depth buffer using OpenGL's glReadPixels function and try to achieve a high rendering and reading throughput.

Question: How do I need to use glReadPixels to achieve the fastest reading time for colour and depth buffers?

example code:

The following modified HelloPangolin example (code+cmake: HelloPangolin.zip) reads the buffer directly to a opencv matrix/image, measures the time of this operation, and writes the image to disk:

#include <pangolin/pangolin.h>
#include <opencv/cv.hpp>

#define READ_BUFFER

int main( int /*argc*/, char** /*argv*/ )
{
    pangolin::CreateWindowAndBind("Main",640,480);
    glEnable(GL_DEPTH_TEST);

    // Define Projection and initial ModelView matrix
    pangolin::OpenGlRenderState s_cam(
        pangolin::ProjectionMatrix(640,480,420,420,320,240,0.2,100),
        pangolin::ModelViewLookAt(-2,2,-2, 0,0,0, pangolin::AxisY)
    );

    // Create Interactive View in window
    pangolin::Handler3D handler(s_cam);
    pangolin::View& d_cam = pangolin::CreateDisplay()
            .SetBounds(0.0, 1.0, 0.0, 1.0, -640.0f/480.0f)
            .SetHandler(&handler);

#ifdef READ_BUFFER
    // reserved memory for colour and depth images
    const uint h=480, w=640;
    cv::Mat colour(h, w, CV_8UC3);
    cv::Mat_<float> depth_gl(h, w);
    cv::Mat_<uint8_t> depth_ui8(h, w);
#endif

    while( !pangolin::ShouldQuit() )
    {
        // Clear screen and activate view to render into
        glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);
        d_cam.Activate(s_cam);

        // Render OpenGL Cube
        pangolin::glDrawColouredCube();

        // Swap frames and Process Events
        pangolin::FinishFrame();
#ifdef READ_BUFFER
        GLuint query;
        glGenQueries(1, &query);
        glQueryCounter(query, GL_TIMESTAMP);
        int done;

        GLint64 tgl_start, tgl_end;

        // read colour buffer
        // read BGR instead of RGB because of OpenCV's colour channel order
        glGetInteger64v(GL_TIMESTAMP, &tgl_start);
        glReadPixels(0, 0, w, h, GL_BGR, GL_UNSIGNED_BYTE, colour.data);
        if(glGetError()!=GL_NO_ERROR) { throw std::runtime_error("glReadPixels (GL_BGR) failed!"); }
        for(done = 0; !done; glGetQueryObjectiv(query, GL_QUERY_RESULT_AVAILABLE, &done));
        glGetInteger64v(GL_TIMESTAMP, &tgl_end);
        std::cout << "gl colour read time: " << (tgl_end-tgl_start)/float(1e6) << " ms" << std::endl;

        cv::imwrite("colour.png", colour);

        // read depth buffer
        glGetInteger64v(GL_TIMESTAMP, &tgl_start);
        glReadPixels(0, 0, w, h, GL_DEPTH_COMPONENT, GL_FLOAT, depth_gl.data);
        if(glGetError()!=GL_NO_ERROR) { throw std::runtime_error("glReadPixels (GL_DEPTH_COMPONENT) failed!"); }
        for(done = 0; !done; glGetQueryObjectiv(query, GL_QUERY_RESULT_AVAILABLE, &done));
        glGetInteger64v(GL_TIMESTAMP, &tgl_end);
        std::cout << "gl depth read time: " << (tgl_end-tgl_start)/float(1e6) << " ms" << std::endl;

        cv::normalize(depth_gl, depth_ui8, 0, 255, cv::NORM_MINMAX);

        cv::imwrite("depth.png", depth_ui8);
#endif
    }

    return 0;
}

buffer reading time:

I get different reading times for the colour and depth buffer on different machines, which makes me wonder what reading times I can expect. E.g.

I read somewhere that the buffer type and the reading type (GL_FLOAT) need to match to prevent type casts. But when using GL_UNSIGNED_INT_24_8 (24bit depth, 8bit stencil) in place of GL_FLOAT I get GL_INVALID_OPERATION (0x0502).

1ms for reading the colour buffer (w x h x 3 x 8bit) seams reasonable to me. Since the depth buffer is supposed to be of the same size (w x h x 24bit), I would expect similar reading times there.

How do I need to setup Pangolin or glReadPixels to achieve the fastest buffer reading time?

stevenlovegrove commented 3 years ago

If you haven't noticed, I'm trying to get through my Issue backlog with some free time I have right now :)

Again, 3 years too late, but this isn't a simple topic! glReadPixels is a pretty horrible API because it blocks (and we know multithreading in OpenGL is rough). It has to block because it needs to guarantee the lifetime of the memory it's copying into, and you need to know when it's ready.

I'd recommend reading http://www.songho.ca/opengl/gl_pbo.html about pixel buffer objects and causes of GPU stalls. You can also upload / download from PBO's asynchronously (e.g. https://www.seas.upenn.edu/~pcozzi/OpenGLInsights/OpenGLInsights-AsynchronousBufferTransfers.pdf) so that you can overlap IO etc (again, you need to be careful to avoid stalls by being careful about things that will cause the execution queue to stall).