jaidonlybbert / MixedPrecisionBlockQR

CUDA implementation of mixed-precision block QR decomposition
MIT License
1 stars 4 forks source link

Mixed-Precision Block QR Decomposition

A rectangular matrix A can be factored into a product of an orthogonal matrix Q, and an upper triangular matrix R; A = QR.

- Van Loan Golub. Matrix Computations, Fourth Edition.

Overview

The QR decomposition can be used as a non-linear least squares matrix solver. It is numerically stable, and well-suited for parallelism. These properties make it useful for processing matrix data from cameras and sensor arrays, such as for SLAM, background subtraction, radio communications, object encoding, point cloud visualization, and many more applications.

This project was started as a University of Washington graduate school project in collaboration with Amazon Lab126. The objective, a fast and correct parallel Block QR decomposition algorithm using half-precision FP16 matrix-matrix multiplies. The Mixed-precision Block QR algorithm is well-suited for large, wide matrices. Other QR algorithms are better suited for small or tall-and-skinny matrices.

Test Data

Our test data is derived from the Euroc-MAV dataset to emulate using the QR decomposition to perform a non-linear least squares optimization for robot pose estimation and bundle adjustment for SLAM applications. The implementation works for arbitrarily sized matrices. The dimensions of the largest matrices in our test data were on the order of 2000 x 2000.

What this project is

What this project isn't

Dependencies

Linux

Windows

Installation

Executables can be found in the /build directory. On Windows, a Visual Studio solution (.sln) is built into the build directory.

After running the tests, the Python script /Cuda/performance/runtime.py can be run to parse the test log results and generate graphs.

Our Test Results (work in progress)

Error

Execution Time

Related Projects

References