popcornmix commented 12 years ago

Suggestion for property/value message passing interface from ARM to GPU.

We have a generic message passing structure. The first word is always the property of interest (could be small integers or fourcc codes). The remainder of the structure is property specific data whose length depends on the property. This data is used for both input and output data.

define MBOX_CHAN_GETPROP 8

define MBOX_CHAN_SETPROP 9

// generic property structure struct property_s { uint32_t property; uint32_t data[]; }

// get property void *property = <> uint32_t success; mbox_write(MBOX_CHAN_GETPROP, property); mbox_read(MBOX_CHAN_GETPROP, &success);

// set property void *property = <> uint32_t success; mbox_write(MBOX_CHAN_SETPROP, property); mbox_read(MBOX_CHAN_SETPROP, &success);

Example:

define CLOCK_EMMC 1

struct property_one_word_s { uint32_t property; uint32_t value; } *emmc_clock;

uint32_t success; emmc_clock = emmc_clock->property = CLOCK_EMMC; emmc_clock->value = 50000000; _wmb(); mbox_write(MBOX_CHAN_SETPROP, emmc_clock); mbox_read(MBOX_CHAN_SETPROP, &success); _rmb();

We could also use this to handle the existing framebuffer allocation and power control in a more consistent way.

Any config.txt properties can be retrieved with this mechanism. E.g. struct property_one_word_one_string_s { uint32_t property; uint32_t value; uint8_t string[MAX_STRING]; } *config_txt;

Similarly cmdline.txt and other ATAGs (e.g. memory size) can be got.

Properties that can be got or set: CLOCK_EMMC CLOCK_UART CLOCK_ARM CLOCK_CORE CLOCK_V3D CLOCK_H264 CLOCK_ISP POWER_CONTROL FRAMEBUFFER

Properties that can be got: CMDLINE CONFIG_STRING SERIAL_NUM BOARD_REV MAC_ADDRESS DISPLAY_WIDTH DISPLAY_HEIGHT DISPLAY_DEPTH DMA_CHANS MEMORY_SIZE

Any additional suggestions?

nomis commented 12 years ago

I'd suggest including the length of each property and the size of the request/response buffer, so that it's not limited to predefined 32 bit values. The buffer could also hold multiple properties in a TLV format.

For command line, serial number and board revision I think these would arrive too late in the boot process if the mbox driver had to be used to get them. The command line and memory size/reserved memory must be passed through the device tree file. As the MAC address is static it can go in the device tree file too.

For power control the easiest option is to define a unique 32-bit identifier for each device that can be turned on or off, as this seems to work quite well in the device tree file (currently I use the value to define the bit of the power state array). I'd also prefer to always get a response, it's easier to return success/failure that way...

For the clocks there should be properties for getting the current state (enabled/disabled), rate, min rate, max rate. Instead of having to define multiple tags for each of these there could be CLOCK_ENABLE, CLOCK_GET_MIN_RATE etc. that had a unique clock identifier as part of the value. I also need to know which clocks are parents of the other clocks (so that any power saving disabling of clocks doesn't have unintended side effects).

See include/linux/clk-provider.h for what the clock interface expects. The enable/disable functions can't sleep... I can implement polled mode mbox calls but it'd need to have a response in the mailbox as soon as I've written the request.

How much of the mbox/bell/property interface is generic to all BCM2835 and how much of it is Raspberry Pi specific?

rosery commented 12 years ago

Definitely the right sort of thing. Could you add SetProperty for

framebuffer_ignore_alpha=1 framebuffer_align=0x100000 hvs_swap_red_blue=1

then there would be no need to know about what is in kernel .img beforehand? Thanks John

nomis commented 12 years ago

I'd prefer not to have any access to config.txt at all because this information could be out of date. For example, we could have changed the clock rates since then (and forgotten that we did so because this might be a new kernel executed with kexec). You'd end up having to define two ways to access the same data.

popcornmix commented 12 years ago

@lp0

For command line, serial number and board revision I think these would arrive too late in the boot process if the mbox driver had to be used to get them. The command line and memory size/reserved memory must be passed through the device tree file. As the MAC address is static it can go in the device tree file too.

Yes, you will still get the device tree options. It's useful to have an alternative mechanism for non-linux/devtree users.

See include/linux/clk-provider.h for what the clock interface expects. The enable/disable functions can't sleep... I can implement polled mode mbox calls but it'd need to have a response in the mailbox as soon as I've written the request.

Well you'll have to poll for GPU to respond. I'd imagine that was typically in the tens of microseconds range, but could be worse when we are busy (I'd have thought well under 1ms)

How much of the mbox/bell/property interface is generic to all BCM2835 and how much of it is Raspberry Pi specific? The hardware is common to BCM2835. All BCM2835 platforms use the mboc/bell for some communications (like setting up vchiq). The property scheme hasn't been added to any platform yet, but could be added to all BCM2835 platforms if it proves useful.

nomis commented 12 years ago

Example get request: u32 buffer size = 4096 u32 tag = 0x12340001 (get clock state) u32 length = 4 u32 value = 0x42 (clock id 42) u32 tag = 0x12340002 (get clock rate) u32 length = 4 u32 value = 0x42 (clock id 42) u32 tag = 0x0 (end of data)

Example get response: u32 buffer size = 4096 u32 tag = 0x12340002 (clock state) u32 length = 8 u32 value[2] = 0x42, 0x0 (clock id 42, off) u32 tag = 0x0

Some sort of tag for "error, buffer is too small" would also be appropriate.

popcornmix commented 12 years ago

I'd prefer not to have any access to config.txt at all because this information could be out of date. For example, we could have changed the clock rates since then (and forgotten that we did so because this might be a new kernel executed with kexec). You'd end up having to define two ways to access the same data.

Yes, there's less reason for linux to use this, with devtree/ATAGs. However it could be useful for non-linux (e.g. RISCOS) so it seems reasonable to support it in firmware, but not make use of it in linux.

nomis commented 12 years ago

With this scheme you don't actually need two mbox channels, as the tag id indicates get/set. The response message should probably be the same value (memory location) as the request message so that it can be used asynchronously with multiple buffers.

nomis commented 12 years ago

Yes, there's less reason for linux to use this, with devtree/ATAGs. However it could be useful for non-linux (e.g. RISCOS) so it seems reasonable to support it in firmware, but not make use of it in linux.

My point is that it gets out of date quickly as config.txt configures the VC. For example, if the OS queries for a config.txt property and it's not there it would need to know what the default is, which could vary based on firmware version. If it always retrieved the current data instead it wouldn't need to do this.

popcornmix commented 12 years ago

@lp0

Example get request: ... Example get response:

Can you confirm that the same buffer is shared for the request and response? Presumably the success value returned through mbox could indicate buffer too small?

Feel free to specify the data format for any commands you care about.

nomis commented 12 years ago

Yes, the same buffer would be shared for the request and response.

The mbox values could a status code in the lower bits if we increased the alignment size of the buffer.

Could you enable the wiki for this repository so that a format could be documented?

popcornmix commented 12 years ago

Could you enable the wiki for this repository so that a format could be documented?

Should be done.

rosery commented 12 years ago

lp0: access to command line and memory sizes etc within RISCOS via this mechanism would be brilliant.. we do have sufficient control of the boot process that we can request this information in good time to make use of it.

nomis commented 12 years ago

Here's a first attempt at defining a format: https://github.com/raspberrypi/firmware/wiki/Mailbox-property-interface

I've left out the extra frame buffer tags for rosery to define. Depth should probably be replaced with a tag defining the whole fb_info format.

I don't know if this is going to be too verbose or not, but there's plenty of memory and a 4K page could let you get 200+ properties at once.

Update: I've specified that the value is padded to make the next tag 32-bit aligned.

rosery commented 12 years ago

@lp0 I have added tags for the 2 most critical RISCOS things.(I hope I've done it correctly).

Were you proposing that the current framebuffer request channel be redundant? if so, we probably would benefit from a tag that gave and received all the frame buffer elements in 1, including the bytes per line, which can be significantly greater than the pixels per line. (this is currently in the standard fb channel request.

Secondly, checking my understanding, in channel 8, the buffer used is presumably always within arm space, provided by the arm..

nomis commented 12 years ago

Were you proposing that the current framebuffer request channel be redundant?

Yes, it already can't be extended without breaking compatibility with existing kernels.

if so, we probably would benefit from a tag that gave and received all the frame buffer elements in 1

The request can include multiple tags that would get processed together.

Secondly, checking my understanding, in channel 8, the buffer used is presumably always within arm space, provided by the arm..

Yes

rosery commented 12 years ago

On 17/06/2012 16:02, Simon Arlott wrote:

Were you proposing that the current framebuffer request channel be redundant? Yes, it already can't be extended without breaking compatibility with existing kernels.

OK.. there are a couple more tags we need to add, then, to make sure we have covered the functionality that currently exists. I'll give that a go this evening.

John

John Ballance C.Eng MIET - jwb@rosery.net - 07976 295923

if so, we probably would benefit from a tag that gave and received all the frame buffer elements in 1 The request can include multiple tags that would get processed together.

Secondly, checking my understanding, in channel 8, the buffer used is presumably always within arm space, provided by the arm.. Yes

Reply to this email directly or view it on GitHub: https://github.com/raspberrypi/firmware/issues/47#issuecomment-6381780

popcornmix commented 12 years ago

get timing has a response length of 8, but only u32 is described set power state has a request length of 4, but two u32 values get buffer has a response length of 8, but three u32 values set buffer has a request/response length of 8, but three u32 values framebuffer needs a pitch parameter (GPU requires padding for non-aligned widths) Does framebuffer need virtual width/height for panning?

nomis commented 12 years ago

I've added all the other values that are currently used and fixed the sizes.

I've also added test versions of the setter properties so that new display settings can be proposed before setting them, and removed the set buffer option as it should be unnecessary if the settings are tested first.

nomis commented 12 years ago

Added a tag to release the buffer and included the alignment in the get buffer request.

popcornmix commented 12 years ago

I'm not overly keen on the framebuffer tag interface. All the other tags I can act on and fill in the response buffer as I go. I would prefer a single tag containing all the framebuffer properties. If you want to just modify a subset of that, you can do a read/modify/write. There's a bit of uncertainty if a list of tags contains the same tag twice. For non-framebuffer tags, I would just act on it twice and respond twice. For framebuffer tags, I guess I will ignore the first one. I assume you don't mind about the order of tags returned? Might be easiest to just set a bit for each framebuffer tag seen, and respond to them in fixed order at the end. I assume if I don't like any of the framebuffer tags, then I will ignore them all? E.g. if I'm in 640x480x32bpp and you ask for 4096x1536x16bpp, then I wouldn't put you in 640x480x16bpp mode, because that was the only tag I liked?

nomis commented 12 years ago

I agree that the other tags are easier to work with but the frame buffer is more complex. A single tag becomes limiting when it needs to be extended.

Yes, any order of response is ok - it'll just scan through them to get the ones it wants. The same tag shouldn't occur twice in the request so the behaviour in that scenario would be undefined. The same tag twice in the response also makes the result undefined. Using the last value in each case would be the likely outcome.

Yes, in Set mode it would refuse to change anything. In Test mode it should try to correct the values to the nearest limit or not return them if there's nothing it can do to fix it.

rosery commented 12 years ago

@popcornmix @lp0 Doing the whole frame buffer in 1 tag would fit my needs well

What I would do, probably, is set/specify framealignment, PixelOrder and Alphamode individually, as these would need doing just once (in a while), whereas all the elements of framebuffer would need altering in unison for a screen mode change. Read/Modify/Write is fine..

nomis commented 12 years ago

I've modified the clock tags to specify that it always returns a tag even if the clock doesn't exist so that it's easier to process the response if someone does request the same clock rate is set multiple times...

nomis commented 12 years ago

What I would do, probably, is set/specify framealignment, PixelOrder and Alphamode individually, as these would need doing just once (in a while)

But now you've got some parts of it that get set once and some that get set every time you make a change and the result if you change both of them at the same time isn't defined (can one of those changes fail but the other succed?).

popcornmix commented 12 years ago

Can I assume you won't mix test/get/set calls in a single message? Will get buffer wait for other framebuffer set calls to be processed. (Actually can you list the tags you would expect to be processed at end)

Suppose you have a framebuffer in use that you have previously called get buffer on. Now you call set width/height to a larger size. Will that fail until a release buffer is called?

nomis commented 12 years ago

Can I assume you won't mix test/get/set calls in a single message?

Yes, mixing test and set is invalid. Test and get would be ok in theory but I can see that being difficult to process.

Will get buffer wait for other framebuffer set calls to be processed.

Yes, I'd expect get to return after all the set operation is finished.

Suppose you have a framebuffer in use that you have previously called get buffer on. Now you call set width/height to a larger size. Will that fail until a release buffer is called?

If you don't include a get buffer tag then the buffer is not allowed to change base or size so the request would fail. If a get buffer tag is included then the new buffer is returned when the change is successful.

I've renamed "get buffer" to "allocate buffer".

popcornmix commented 12 years ago

Okay, so a sequence of sets and a get buffer implicitly frees the old buffer. You don't have to explicitly call release buffer except when the framebuffer is destroyed.

nomis commented 12 years ago

Correct - I'll add that to the page

nomis commented 12 years ago

MEMORY_SIZE

I know you want to make it easier for OSes that don't want to parse DT or ATAGs but isn't there a problem with using memory to pass buffers around before you know how much memory you have?

If issue #45 gets implemented then additional memory can be allocated to the VC in suitable chunks.

@rosery, what does RISCOS do to determine available memory?

I've added tags for this anyway...

rosery commented 12 years ago

@lp0 In RISCOS what happens is (roughly): HAL + kernel starts at physical address 0 - this is our kernel.img some minimal code is run in the HAL to gain control of the ARM chip. it will then interact with the mailbox, before the mmu is enabled, to inquire the mem size. (and get an initial frame buffer..). a few other hardware related things are also initialised.

This kernel is then invoked, with parameters that include the location of the 'kernel.img' and the amount of ram available. It is only at this stage that general memory map is constructed.

The ram size was initially taken from ATAGs, but this mailbox method would suit considerably better.

nomis commented 12 years ago

Ok. I've added a get clocks property too, so that we know the dependencies between the clocks.

rosery commented 12 years ago

Frame buffer stuff is Looking good. A question: allocating a frame buffer requires a min of set width/height/depth. Will the other tags get returned, even if not requested?.. e.g. pitch is (almost always) implicitly set from the 'fit' of the request into the screen attached.. etc

nomis commented 12 years ago

Tags won't be returned unless requested. I'll remove the Test/Set versions of pitch. I don't want to merge it into "allocate buffer" as that complicates it by requiring that the pitch also doesn't change if the existing buffer is being reused.

swarren commented 12 years ago

Since lp0 wrote the wiki now popcornmix, I assume this is some completely new API that isn't defined/implemented yet (which is slightly at odds to popcornmix having said "We have a generic message passing structure" rather than "We could/can have ...")

This sounds like a reasonable approach in general.

The buffer alignment to 256 bytes isn't hard to achieve, but potentially does waste a bit of memory. Can we make the buffers 16-byte aligned and use the entire 28-bit message portion of the mailbox to contain the buffer address? We can simply move the request/response code to be the second u32 in the buffer. The only issue here would be the lack of response code if the memory address was invalid, but that seems acceptable to me.

I agree with lp0's comments that all APIs should reflect current state in general, so that the responses are always "accurate" at current time. Of course, we can always define some messages that specifically request initial state or config.txt content if there's a use for that, although it does seem slightly unlikely.

I would suggest that for each tag, we take the max of request and response data size, and allocate that much space in the buffer in the request even if it isn't needed. That way, when the VC is generating the response, it won't need to re-write the tag IDs and move them about based on request/response size differences, but rather it will just fill the data into the pre-existing space after the tag. Perhaps each tag should also contain an individual allocated length field for error-checking purposes and extensibility - something like:

Overall format: u32 overall buffer length u32 request/response code

Then a list of tags, each: u32 tag buffer length (this length field's length plus tag ID field's length plus length of request/response data area following) u32 tag ID u8[tag buffer length - 8] request/response data

For variable-length response (e.g. a request like get command line), the tag-specific response format could be : u32 needed length null-terminated string

That way, if the request's tag buffer length (minus 8) isn't large enough for the response data, the VC can fill in what it can, but still indicate the length needed to return the entire string, and then the ARM-side can send a larger buffer next time.

I haven't actually reviewed the specific messages in the wiki in detail yet.

swarren commented 12 years ago

I put some comments inline in the wiki. I hope that's a reasonable place - easier to see which parts I'm talking about that way.

Re:

See include/linux/clk-provider.h for what the clock interface expects. The enable/disable functions can't sleep... I can implement polled mode mbox calls but it'd need to have a response in the mailbox as soon as I've written the request.

Even though the Linux clock API enable/disable functions can't sleep, prepare/unprepare can. I believe the idea is that if you can enable/disable your clocks without blocking/scheduling, you implement enable/disable the clocks inside the enable/disable functions, but if you need to block/schedule, you enable/disable the clocks inside the prepare/unprepare functions. BCM2835-specific drivers might be aware of which of prepare or enable is going to actually enable the clocks, and call them at the appropriate times, e.g. calling prepare later rather than earlier if possible. And I think the differentiation is whether enable has to block/schedule more than the amount of time it takes, so you could always poll in enable for a "long" time if you have to, although IIRC this ends up holding clock locks while doing it, so isn't the best idea.

nomis commented 12 years ago

I would suggest that for each tag, we take the max of request and response data size, and allocate that much space in the buffer in the request even if it isn't needed.

The caller may not know the size of the response in advance if it gets increased to provide more information. You first suggest 256 bytes is too much and then to leave the request unmodified, which will also use more memory?

That way, if the request's tag buffer length (minus 8) isn't large enough for the response data, the VC can fill in what it can, but still indicate the length needed to return the entire string, and then the ARM-side can send a larger buffer next time.

I wouldn't want to have to do that in the kernel... it'd need to be able to re-arrange all the tags to accommodate every one whose response was now too big.

Even though the Linux clock API enable/disable functions can't sleep, prepare/unprepare can.

The description also implies that prepare is too early to enable them. I was planning on polling until it completes.

swarren commented 12 years ago

Re: tag re-allocation: I'd actually expect any message to contain only 1 tag at a time, so there wouldn't be any need for the kernel to re-arrange tags if the first attempt didn't have a large enough buffer. Still, that would also imply that the VC wouldn't have to re-arrange tags in order to generate the responses, which ends up making my suggestion less useful anyway.

Re: prepare API: I believe the whole reason for the prepare API's existence (it was only fairly recently added IIRC) is so that prepare can enable the clocks if the code needed to enable the clock can't be implemented in enable for some reason, such as blocking. Still, if polling won't take too long, that's probably acceptable too.

swarren commented 12 years ago

Oh, re: "You first suggest 256 bytes is too much and then to leave the request unmodified, which will also use more memory?"

I didn't suggest leaving the request unmodified, but rather leave all the tag and length fields in the same place, and just replace the request data with response data. That way, you don't have to move all the tags about when generating the response. And the size of the request/response itself wasn't the issue I wanted fixed, but the need to align the /start/ of the packet to a 256-byte boundary.

rosery commented 12 years ago

@swarren - the concept of padding the datalength of the request tag to the same as the response tag, for most tags, makes excellent sense. It means that the responder does not need to do any allocation or tag shifting, instead all it has to do is insert responses in the spaces provided. The only sort of tag that may need additional treatment is a tag of unknown response length, such as reading a commandline string. In this case, the behaviour should perhaps be to return what can fit in the space provided, with one field reporting the total length needed for a full reply

adammw commented 12 years ago

sorry to step on your toes while still designing, but will these apis/channels allow changing of the overscan (and similar config.txt parameters) while the device is running?

popcornmix commented 12 years ago

@adammw No. While we can add tags in to describe scaling of the framebuffer, that wouldn't help a linux user as there is no plumbing that will make this get called. If you have further comments on this pleasse create a new issue.

nomis commented 12 years ago

Those could be added to the framebuffer driver as sysfs parameters

popcornmix commented 12 years ago

Okay we should add tags to describe this. We already have source offset_x, offset_y, width and height. We need the same four parameters in destination space.

nomis commented 12 years ago

I've updated the format to respond in-place.

nomis commented 12 years ago

Can you support overscan top/bottom/left/right parameters and change them without moving the buffer base address?

popcornmix commented 12 years ago

Yes.

popcornmix commented 12 years ago

I've updated the format to respond in-place.

This isn't clear to me. So the tags now remain in place for the response? is the:

u32: value buffer size in bytes

the maximum of request length and response length?

Do you want ARM and VC memory addresses as bus or physcal addressses?

nomis commented 12 years ago

It's the maximum size of the request/response length, the value after that indicates how much is actually used and if it's a request/response. Yes, they now remain in place.

Be careful how you handle the kernel being newer/older than the VC that provide a buffer that's smaller/larger than you expect for the response.

I'd prefer the ARM physical addresses as used by the existing framebuffer interface.

popcornmix commented 12 years ago

I'd prefer the ARM physical addresses as used by the existing framebuffer interface.

I believe the framebuffer interface uses bus addresses (i.e. GPU addresses like 0x4xxxxxxx). From my point of view a physical address is before the VideoCore MMU, e.g. 0x00000000 for RAM 0x20000000 for peripherals

A bus address is after the VideoCore MMU. e.g. 0x40000000 for RAM 0x7e000000 for peripherals

popcornmix commented 12 years ago

With set power state

Bit 0: 0=off (failed), 1=on (ok)

This seems ambiguous. If I just turned it off (successfully) do I return 0=off (failed) or 1=on (okay) ?

raspberrypi / firmware

Add new property/value mailbox channel #47

define MBOX_CHAN_GETPROP 8

define MBOX_CHAN_SETPROP 9

define CLOCK_EMMC 1