Disclaimer: This is a independent documentation project based on a combination of static analysis and trial and error on real hardware. This work is 100% independent from and not sanctioned by or connected with Broadcom or its agents. No Broadcom documents or materials were used beyond those publically available (see Referenced Materials). This work was undertaken and the information provided for non commercial use on the expectation that hobbyists of all ages will find the details useful for understanding and working with their Raspberry Pi hardware. The hope is that Broadcom will be flattered by the interest in the device and understand the benefits of opening up understanding to a larger audience of potential customers and developers. Broadcom should be commended with making their SoC available for a project as exciting as the Raspberry Pi. The intent is that no copyrighted materials are contained in this repository.
Purpose of this repo: Documentation and samples on the VideoCore IV instruction set as used in the BCM SoC used in the Raspberry Pi. As of early 2016, Broadcom has yet to release public information on the VPU, so it is hoped you find this repo useful.
The BCM2835 SoC (System on a Chip) in the original RaspberryPi has the following significant computation units:
Newer Raspberry Pi mix things up with faster and more modern ARM cores, but the VPU information here is still relevant.
For more information on the Raspberry Pi, see the foundation's site at http://raspberrypi.org, or the embedded linux wiki at http://elinux.org/R-Pi_Hub.
Active discussions take place on IRC (freenode) on #raspberrypi-internals, #raspberrypi-osdev, #raspberrypi-dev, and #raspberrypi.
There is a raspberrypi-internals mailing list, you can subscribe at mailing list page at freelists.org.
We are in a very early stage of understanding of the device. At this stage we only have Serial IO and GPIO for flashing things like the status led. You will need to attach a terminal to the Mini UART on the GPIO connector. For more details see "Getting started" below.
It is now possibly to use VideoCore Kernels from Userland / Linux, see https://github.com/hermanhermitage/videocoreiv/wiki/VideoCore-IV-Kernels-under-Linux. Our understanding of the Videocore Processor is nearing completion, and it is an excellent target for integer SIMD and DSP kernels. Essentially, it can be used for 16 way SIMD processing of 8, 16 and 32 bit integer values.
I recommend starting with Julian's GNU toolchain, at https://github.com/itszor/vc4-toolchain
[2016-06-15] SDRAM and ARM initialization reference code for the VPU is now available at https://github.com/christinaa/rpi-open-firmware
[2016-05-03] Kristina Brooks has got David Given's LLVM fork to work on rPi at https://github.com/christinaa/LLVM-VideoCore4
[2016-04-23] Julian Brown has pulled together bits and pieces of previous GNU toolchain work and fixed them up so they work together at https://github.com/itszor/vc4-toolchain
[2015-11-08] (QPU). Koichi NAKAMURA has developed a Python library for GPGPU on Raspberry Pi at https://github.com/nineties/py-videocore.
[2016-04-21] (QPU). mn416 has developed QPULib, a programming language and compiler for the Raspberry Pi's Quad Processing Units at https://github.com/mn416/QPULib
[2015-01-02] (QPU). A new QPU macro assembler from Marcel Müller. This builds on Pete and Eman’s earlier QPU assemblers to include support for macros and functions, at http://maazl.de/project/vc4asm/doc/index.html and https://github.com/maazl/vc4asm/
[2014-10-28] (VPU). RPi foundation discusses how Argon Design use the VPU to accerate stereo depth perception at https://www.raspberrypi.org/blog/real-time-depth-perception-with-the-compute-module/ a comment about using MMAL is at https://www.raspberrypi.org/blog/real-time-depth-perception-with-the-compute-module/#comment-1078440
[2014-06-10] (QPU). Louis Howe gives a talk on 'Hacking the Raspberry Pi's VideoCore IV GPU' at https://www.youtube.com/watch?v=eZd0IYJ7J40
[2014-06-09] (QPU). Pete Warden wrote a series of posts covering Deep Learning, Optimizing for QPU, and Image Recognition at https://petewarden.com/2014/06/09/deep-learning-on-the-raspberry-pi/, https://petewarden.com/2014/08/07/how-to-optimize-raspberry-pi-code-using-its-gpu/ and https://petewarden.com/2015/05/10/image-recognition-on-the-raspberry-pi-2/. He updated Eric's QPU assembler at https://github.com/jetpacapp/qpu-asm, and added QPU support to his DeepBeliefSDK at https://github.com/jetpacapp/DeepBeliefSDK/, and a QPU implementation of GEMM matrix-multiply at https://github.com/jetpacapp/pi-gemm
[2014-05-03] (QPU). Eric Lorimer wrote a set of posts on Hacking The GPU For Fun And Profit (including SHA hashing) at https://rpiplayground.wordpress.com/2014/05/03/hacking-the-gpu-for-fun-and-profit-pt-1/, and wrote his own QPU assembler at https://github.com/elorimer/rpi-playground
[2014-02-28] (QPU). Broadcom announced the release of full documentation for the VideoCore IV graphics core, and a complete source release of the graphics stack at https://www.raspberrypi.org/blog/a-birthday-present-from-broadcom/. Note this does NOT include VPU documentation, except in so much that the source drop includes samples of VPU assembly.
[2014-01-30] (QPU). Andrew Holme's QPU Fast Fourier Transform at http://www.aholme.co.uk/GPU_FFT/Main.htm.
Volker Barthelmann has been adding Videocore IV to his tool chain and has a preliminary preview of his vasm assembler at http://www.ibaug.de/vasm/vasm.tar.gz, and vcc compiler at http://www.ibaug.de/vbcc/vbcc_vc4.tar.gz.
David Given is adding Videocore IV support to ACK compiler & tool chain at http://tack.hg.sourceforge.net:8000/hgroot/tack/tack in the dtrg-videocore branch.
phire's https://github.com/phire/llvm repo contains some early work on porting llvm to videocore, and is ripe for someone to grab and continue. Phire has recently restarted work on this project.
mm120's https://github.com/mm120/binutils-vc4/tree/vc4 repo is a work in progress adding videocore support to gnu binutils. It seems to be coming along nicely, and I will add some prebuilt binaries for Linux, OSX, Windows and RPi/Linux to github soon.
thubble's https://github.com/thubble/vcdevtools, https://github.com/thubble/videocore-elf-dis and https://github.com/thubble/binutils-vc4/tree/vc4 repos cover a videocore disassembler (C#), a preliminary assembler (C) and a bootloader (asm) that can receive code via UART. thubble is particularly focussed on documenting the instructions of the integer vector processing unit (PPU).
mgottschlag's https://github.com/mgottschlag/resim and https://github.com/mgottschlag/vctools repos are focussed on tools and information for reverse engineering the bcm2835's hardware registers and functional blocks. mgottschlag is generating register access traces by simulating code sequencies on a remote computer running a videocore emulator and forwarding them to a real bcm2835 running a small monitor.
dwelch67's https://github.com/dwelch67/rpigpu repo is focussed on bare metal samples written in C. dwelch67 has two experimental binary translators targeting the videocore instruction set. one translates mips to videocore and the other translates arm thumb to videocore.
All information here has been obtained solely by a combination of:
All activities were undertaken on a Raspberry Pi running Debian.
Those interested in the legal issues involved with reverse engineering activities, please review:
We do not accept materials nor publish materials relating to DRM or its circumvention.
Available at https://github.com/raspberrypi/firmware/tree/master/boot. Releases after May the 10th 2012 are accompanied by a LICENSE.broadcom readme file containing copyright notice, a disclaimer and guidelines for use. Prior to this date the readme was not present.
The distribution debian6-19-04-2012.zip from http://www.raspberrypi.org/downloads was used a development platform for the majority of the work you find here.
The original Alphamosaic patents and patent applications provide a wealth of information for understanding the structure of the VideoCore instruction set and architecture. Whilst the instruction encodings are different, and only a limited range of instructions are indicated they prove an invaluable reference for understanding the design space the engineers were exploring.
The newer Broadcom SoC patents and applications provide detailed information on how the VideoCore has been been integrated into a broader platform setting. They are invaluable for gaining a deeper insight into the additional function units present in the BCM2835 and how they fit together.
Some snippets of information appear in third party documents.