mn416 / QPULib

Language and compiler for the Raspberry Pi GPU
Other
429 stars 64 forks source link

Question: Aren't there 16 QPU's? #33

Closed wimrijnders closed 6 years ago

wimrijnders commented 6 years ago

In target/Emulator.h, I encountered the following:

#define MAX_QPUS 12

However, the VideoCore IV Reference Guide implies that there are 16 QPU's, in the diagram on page 13 and the text on page 14.

I do have to admit that the text in the guide is a bit vague. From page 14:

QPUs are organized into groups of up to four, termed slices,....

The words 'up to' leave too much room for speculation.

Questions:

In addition, I note that the link to the reference guide on the QPULib page is stale. This is a working link:

mn416 commented 6 years ago

Hi @wimrijnders,

I imagine the circuit generator for VideoCore is parameterised by things such as the number of QPUs. This number may vary from SoC to SoC, so the manual is deliberately vague.

How did you determine that the number of QPU's is 12?

Prior work suggests 12:

12 also makes sense as the Pi's GFLOPS is advertised at 24 rather than 32.

Is there a way to detect the number of QPU's at runtime?

May be possible via the mailbox, but I'm not sure.

wimrijnders commented 6 years ago

Yes, both references say 12. So the 'up to four' equals 3!

This was an opportunity to scan both repo's (been there before, will probably be there often again). Lot of interesting stuff there, although it mostly amounts to scavenging. The qpu-tutorial is useful for an overall understanding.

Haven't found a mailbox call yet; This is the best mailbox message reference I could find. Perhaps you know of a better one?

Terminus-IMRC commented 6 years ago

According to V3D_IDENT1 (p.97), there are 4 QPUs on a slice and 3 slices on a Raspberry Pi, so 12 QPUs are there.

wimrijnders commented 6 years ago

@Terminus-IMRC Hi there, nice of you to drop by!

This is great, you refer to a method of determining from the hardware the number of QPU's. Can you suggest some code to read that register and to return the value?

Ah, I see, you took the values from the RESET column, which are for the reference configuration.

Terminus-IMRC commented 6 years ago

As for codes, https://github.com/Terminus-IMRC/qpuinfo may help. You seem to want some more codes with QPU, and here they are: https://gist.github.com/Terminus-IMRC/c5d1f6f78c890c26947a4553296b50d6 .

I took the values from both the RESET column and the real hardware.

wimrijnders commented 6 years ago

:+1: Excellent, thanks.

Scanned your github, you're very deep into QPU programming. My respect, I'm just starting.

wimrijnders commented 6 years ago

Fixed by #45

n3n7i commented 5 years ago

There's a solid mention of 16 qpu's in the VideoCore IV docs Pages 89-90 Registers V3D_SQRSV0 and V3D_SQRSV1 handle scheduling for qpu's 0-7 and qpu's 8-15

Terminus-IMRC commented 5 years ago

@n3n7i No. Even if you enable QPU 12-15 in that register, your program won't be executed on them.

wimrijnders commented 5 years ago

@n3n7i I think those registers are just for the max available number of QPU's. They won't do anything if the given QPU is not there.

The place to look for number of QPU's are bitfields QUPS and NSLC in register V3D_IDENT1, page 97.

n3n7i commented 5 years ago

@Terminus-IMRC Have you checked to see what the typical config reads as? It may have some clues, and i'd be curious Possibly assigned to one of the the non user-code types?

The videocore chip may have room for two threads / programs running, so that could also be the 3/4 slice limit

Terminus-IMRC commented 5 years ago

@n3n7i V3D_SQRSV0 and V3D_SQRSV1 both read as 0x00000000 by default (please run https://github.com/Terminus-IMRC/qpuinfo or refer to https://gist.github.com/Terminus-IMRC/6540ca707997dea2d767 ) which means that QPU 0-15 are all enabled, though QPU 12-15 does not exist (or disabled due to yields) and thus programs won't be executed on them.

As for threads, registers are shared between two threads (though the address is swapped, p.20), so it is unlikely that threads are implemented to use QPU 12-15.

n3n7i commented 5 years ago

We could try disabling slice-0 for user programs and then see if a 12-core program will still run happily, it may automatically switch between slices

Terminus-IMRC commented 5 years ago

If you push 12 QPU programs (which loops infinitely) onto V3D_SRQPC while you enable only 11 QPUs in V3D_SQRSVn, one program will stuck in the queue, that is, QPURQCM will be 12 and QPURQL will be 1 in V3D_SRQCS register, in my experience.

n3n7i commented 5 years ago

I found this bit of spec for the arm1176

The ARM instruction set supports the connection of 16 coprocessors, numbered 0-15, to an ARM processor. In the processor, the following coprocessor numbers are reserved:CP10 VFP control CP11 VFP control CP14 Debug and ETM control CP15 System control.You can use CP0-9, CP12, and CP13 for your own external coprocessors

Terminus-IMRC commented 5 years ago

Coprocessors as in ARM CPU is not what you think it is. They are located inside the ARM and do low-level stuff such as debugging, controlling CPU functions and caches and so on.

QPU is not located inside the ARM. Instead, it is located on the AXI bus inside the SoC (c.f. p.13). The ARM and other components such as memory are also connected to the same bus.