Many thanks for writing a detailed manual about the use and rationale for the Microkit. It is an important resource. I've read the document in details recently; I have some suggestions and questions.

Overview

3rd paragraph: “a program has three distinct entry points: init, notified and, optionally, protected” —> “a program has four distinct entry points: init, notified, optionally, protected, and fault if the program has children”

Document Structure

… generating a firmware image …: the use of “firmware” may be misleading; target platforms may have firmware of their own and this “firmware” image is not intended to be a substitute for those; perhaps use “system” (used earlier and later) or “binary” instead?

Entry points

“A PD does not have to have child PDs, therefore the fault entry point is optional.” —> “A PD does not have to have child PDs, therefore the fault entry point is only required for a parent PD.” (Otherwise the word “optional“ is used twice in these paragraphs but with slightly different semantic weights, which can be confusing: protected is truly optional, never required; but fault is required for a parent PD; unless I'm confused about use of fault, which is possible.)

Scheduling

“The PD has a number of scheduling attributes …” —> “A PD has a number of scheduling attributes (all optional except priority) …” (so "optionality" is discussed/described for all abstractions in the same part of the text)
“PDs with full budgets are scheduled round-robin with a time slice defined by the period.”: Strictly speaking, isn’t this true only if all the relevant PDs have the same period?
Also I would rewrite the sentence to “PDs with equal priority and full budgets are scheduled round-robin with a time slice defined by the period.”
Does the fifth paragraph need another clause for “fault”; something like “or when the PD receives a fault, whereby the passive PD will run on the scheduling context of the parent PD”
Have nested PDs any bearings on scheduling? I suspect nothing, but still worth mentioning? What happens if a child PD incurs a fault, but the parent PD has a very low priority? Is it possible that the handling of the fault could be delayed for an arbitrarily long time? How is this different than when the fault is handled by the Monitor?
How accurate is it to say that execution of any entry points in a PD are deferred until the current one returns. Isn't it true that notified, protected, or fault will not execute until init returns? I feel there are two important ways that a PD is different than a Linux process: (1) entry points are expected to return pretty quickly (in contrast, Linux's main may never return); (2) entry points are not interrupt-able/preempt-able by the invocation of another entry point in the same PD (in contrast, main can be interrupted).

Memory Regions

Are memory regions allocated by the build process initialized? sddf seems to rely on these regions being initialized to zero (e.g., queue.h: tail and head are never explicitly initialized anywhere that I could see).
What’s the default page_size? Smallest available on the given architecture?

Protected procedure

Second paragraph: “When a protection domain calls a protected procedure, the procedure executes within the context of the providing protection domain.”: Is this still true if the PD is passive?
“…call graph between PDs form a directed…” —> “…call graph between PDs must form a directed…” (or “form” —> “forms”)

Notification

What happens if multiple notifications take place while the hosting PD cannot run notified (e.g., because a higher priority PD is running)? Are these notifications coalesced, leading to a single execution of the relevant notified or will notified be executed as many times as the number of notifications received?
Is this why sddf takes great pain to avoid unnecessary calls to notify?
What happens if a notification appears while notified is already running?
What does microkit_notify_delayed do?

Interrupts

Are interrupts masked until disposed off by a call to microkit_irq_ack?
I could not find a mention that one should call microkit_irq_ack after handling an interrupt.
Some sddf examples use the call microkit_irq_ack_delayed; what does it do?

Faults

I’m assuming that by default not only does the Monitor print out details about the fault but also suspends the execution of the offending thread.
The term “Monitor” is introduced here for the first time and does not appear to be defined clearly anywhere. If the manual decides on using "Monitor" more consistently, then, perhaps, wherever the text says “system” it might be advantageously replaced with “monitor” instead?

libmicrokit

This section does not mention fault nor its signature.
Also missing are microkit_notify_delayed and microkit_irq_ack_delayed

Other small changes:

Overview section, 2nd paragraph: “an Microkit” —> “a Microkit”
Concepts, last bullet: “faults” —> “fault” (all other bullets use singular)
Entry points, 2nd paragraph: “notify” —> “notified”
Entry points, 3rd paragraph: “When an Microkit…” —> “When a Microkit…”
Scheduling section, 5th paragraph: “will have it's [...] context” -> “will have its [...] context”
Memory Regions, last paragraph: “mapping” —> “mappings”
Header: “Microkit tool” —> “Microkit Tool” (change also in Document Structure section)
Header: “Adding platform support” —> “Adding Platform Support”
System Description File, protection_domain, passive: “have it's scheduling” —> “have its scheduling”
System Description File: capitalization of the explanations of each parameter is not consistent; some start with lowercase; others with an uppercase (see “program_image” vs. “vaddr”)

Thank you for taking the time to make a detailed write-up. The feedback is greatly appreciated.

The fixes that I mention below are from https://github.com/seL4/microkit/pull/143.

3rd paragraph: “a program has three distinct entry points: init, notified and, optionally, protected” —> “a program has four distinct entry points: init, notified, optionally, protected, and fault if the program has children”

Changed to 'has four distinct entry points: init, notified and, potentially, protected, fault.' because I don't want to introduce the concept of child PDs so early on.

… generating a firmware image …: the use of “firmware” may be misleading; target platforms may have firmware of their own and this “firmware” image is not intended to be a substitute for those; perhaps use “system” (used earlier and later) or “binary” instead?

Changed to 'system image'.

“A PD does not have to have child PDs, therefore the fault entry point is optional.” —> “A PD does not have to have child PDs, therefore the fault entry point is only required for a parent PD.” (Otherwise the word “optional“ is used twice in these paragraphs but with slightly different semantic weights, which can be confusing: protected is truly optional, never required; but fault is required for a parent PD; unless I'm confused about use of fault, which is possible.)

Fixed.

“The PD has a number of scheduling attributes …” —> “A PD has a number of scheduling attributes (all optional except priority) …” (so "optionality" is discussed/described for all abstractions in the same part of the text)

The PD still has all of those scheduling attributes, it just depends what the user decides to be explicit about. I think the entry points are an exception since they're not mentioned in the SDF section. I think it's better to leave the attributes details and whether or not they are optional to the SDF section.

“PDs with full budgets are scheduled round-robin with a time slice defined by the period.”: Strictly speaking, isn’t this true only if all the relevant PDs have the same period?

It's a bit ambiguous with this wording. It would still be round robin but when each PD with be preempted would depend on its own period.

I haven't fixed this in the manual yet because I think the whole scheduling section needs a closer look. This needs to be carefully done so I'll do it in a separate PR.

Also I would rewrite the sentence to “PDs with equal priority and full budgets are scheduled round-robin with a time slice defined by the period.”

Does the fifth paragraph need another clause for “fault”; something like “or when the PD receives a fault, whereby the passive PD will run on the scheduling context of the parent PD”

Fixed.

Have nested PDs any bearings on scheduling? I suspect nothing, but still worth mentioning? What happens if a child PD incurs a fault, but the parent PD has a very low priority? Is it possible that the handling of the fault could be delayed for an arbitrarily long time? How is this different than when the fault is handled by the Monitor?

Yes the handling of the fault could be delayed. It entirely depends on the priority and scheduling attributes of the rest of the system.

The default system fault handler has the highest priority and so will always receive the fault immediately.

I will add something about this in the section.

How accurate is it to say that execution of any entry points in a PD are deferred until the current one returns. Isn't it true that notified, protected, or fault will not execute until init returns? I feel there are two important ways that a PD is different than a Linux process: (1) entry points are expected to return pretty quickly (in contrast, Linux's main may never return); (2) entry points are not interrupt-able/preempt-able by the invocation of another entry point in the same PD (in contrast, main can be interrupted).

Yes it is true that no entry points will be entered until init finishes.

I'm just trying to think what exactly we should mention in the scheduling part. Some of this should only be in the tutorial as we don't want to make the manual too heavy on getting people to understand how an event based system works. I will think about it.

Are memory regions allocated by the build process initialized? sddf seems to rely on these regions being initialized to zero (e.g., queue.h: tail and head are never explicitly initialized anywhere that I could see).

Memory regions that are within main memory/RAM are zero-initialised by the kernel at boot time. This is now mentioned in the section on memory regions.

What’s the default page_size? Smallest available on the given architecture?

This is mentioned in the section on memory regions. I've added it to the SDF section and outlined all the possible values for the page size for each architecture Microkit supports.

Second paragraph: “When a protection domain calls a protected procedure, the procedure executes within the context of the providing protection domain.”: Is this still true if the PD is passive?

Yes, this is not talking about scheduling contexts but a more abstract 'context'. It's just trying to say that the procedure executes inside the PD that provides the procedure.

“…call graph between PDs form a directed…” —> “…call graph between PDs must form a directed…” (or “form” —> “forms”)

Fixed.

What happens if multiple notifications take place while the hosting PD cannot run notified (e.g., because a higher priority PD is running)? Are these notifications coalesced, leading to a single execution of the relevant notified or will notified be executed as many times as the number of notifications received?

If you're talking about the same channel notifying multiple times, then yes the notifications lead to a single execution of notified if they all happen before the receiving PD has a chance to process any of them.

This is probably worth mentioning in the scheduling section of the manual.

Is this why sddf takes great pain to avoid unnecessary calls to notify?

Yes, partially. In general we want to avoid system calls as much as possible while still letting things progress. Another reason is that if you are notifying a PD with a higher priority, this will be guaranteed to call a context switch, something which you may not want to do in all cases.

What happens if a notification appears while notified is already running?

Then when the notified finishes we will go back to the Microkit event handler, check what events are available and see that a new notification has happened and hence notified will be called.

What does microkit_notify_delayed do?

The motivation for this API (which is not in the mainline Microkit yet) is to only use one system call to notify another PD and wait on futher events.

Let's say you are in notified. You have finished processing the notification and call microkit_notify to say that you are finished with the processing. Then notified returns and we go back into the event loop and do another system call in order to wait for the next events. microkit_notify_delayed sets up the notify to happen when we get back into the event loop and uses one system call to do the notify as well as wait for the next event. This can save hundreds of cycles which is why it is used in performance critical code.

Are interrupts masked until disposed off by a call to microkit_irq_ack?

Yes, mentioned in the manual now.

I could not find a mention that one should call microkit_irq_ack after handling an interrupt.

Fixed. It's in the tutorial as well.

Some sddf examples use the call microkit_irq_ack_delayed; what does it do?

It's similar to microkit_notify_delayed but for interrupts instead of notifications.

I’m assuming that by default not only does the Monitor print out details about the fault but also suspends the execution of the offending thread.

A fault causes the suspension of the PD, I will mention this in the section on faults. The monitor does not explicitly unsuspend the PD.

The term “Monitor” is introduced here for the first time and does not appear to be defined clearly anywhere. If the manual decides on using "Monitor" more consistently, then, perhaps, wherever the text says “system” it might be advantageously replaced with “monitor” instead?

The term “Monitor” is introduced here for the first time and does not appear to be defined clearly anywhere. If the manual decides on using "Monitor" more consistently, then, perhaps, wherever the text says “system” it might be advantageously replaced with “monitor” instead?

This section does not mention fault nor its signature.

Fixed.

Also missing are microkit_notify_delayed and microkit_irq_ack_delayed

Not technically missing since they aren't upstreamed.

Overview section, 2nd paragraph: “an Microkit” —> “a Microkit” Concepts, last bullet: “faults” —> “fault” (all other bullets use singular) Entry points, 2nd paragraph: “notify” —> “notified” Entry points, 3rd paragraph: “When an Microkit…” —> “When a Microkit…” Scheduling section, 5th paragraph: “will have it's [...] context” -> “will have its [...] context” Memory Regions, last paragraph: “mapping” —> “mappings” Header: “Microkit tool” —> “Microkit Tool” (change also in Document Structure section) Header: “Adding platform support” —> “Adding Platform Support” System Description File, protection_domain, passive: “have it's scheduling” —> “have its scheduling” System Description File: capitalization of the explanations of each parameter is not consistent; some start with lowercase; others with an uppercase (see “program_image” vs. “vaddr”)

All fixed.

seL4 / microkit

manual: typos and other questions #142

Overview

Document Structure

Entry points

Scheduling

Memory Regions

Protected procedure

Notification

Interrupts

Faults

libmicrokit

Other small changes: