Memory Lane - Zero Cost Logging with deferred formatting

perlindgren commented 4 years ago

This RFC is discussing experiments on Zero Cost Logging for RTFM based applications.

v0.1.0, 2020-03-21 v0.1.1, 2020-03-21 (added more considerations) v0.1.2, 2020-03-24 (altered syntax, to be more Rust like) v0.2.0, 2020-03-25 (alternative approach leveraging messages queues, and relation to run-time verification) v0.2.1 2020-03-31 (alternative approach considerations, improved clarity)

Motivation

Logging (aka printf debugging) is an instrumental tool (no pun) for the developer. However, the overhead of traditional target side logging implies costly formatting and transmission delays that might be intrusive to the behaviour of the application.

Deferred formatting. By deferring formatting to the host side the overhead may be significantly reduced. This however implies that the host need to be able to reconstruct the type of data produced by the target.
Transport. By using in memory buffers polled by the host instead of the target using the CPU to actively transmit data, we may reduce the negative impact of logging/tracing to the behaviour of the application.
Soundness. In a preemptive system like RTFM, logging operations may be interrupted by higher priority task (that may also perform logging). Multiple logging streams may be adopted to allow for lock and race free access.

Overall Design

In order to address the above challenges different approaches may be adopted. Here one approach based on typed "memory lanes" is proposed, that exploits the declarative and static nature of RTFM.

Deferred formatting.

The task declaration is extended with a set of typed logging channels.

#[task(log = enum Task1 {PkgLen([u8;1]) , Header([[u8;4];1], NO_TIME_STAMP), Time([Instant,2]})]
fn task1(cx:task1::Context) {
  log.info(Task1::Time(cx.scheduled).unwrap();
  ...
  log.trace(Task1::PkgLen(pgk.len()).unwrap();
  ...
  log.info(Task1::Header(pkg.header).unwrap();
  ...
  log.info(Time(Instant::now())).unwrap(); // should be very close to the time stamp done by the logger.
}

RTFM will generate static SPSC queues for each enum variant, where the produces is the application and the consumer is the host.

The implementation of lanes.trace/info etc. enqueues the raw data (T:Copy).

On the host side the the corresponding data is dequeued (and the consumer pointer updated). Here might opt out a time stamp (likely time stampiing should be the default).

An Err(_) result indicate that that the queue is saturated (to this end we need to increase the size of the corresponding buffer).

The RTFM app proc macro generates the host side logger (valid for the particular compilation instance), with shared knowledge of the set of lanes and their enum types (including buffer sizes). The generated host level logger also holds information on the task instance.

A possible rendering of the above:

[0x0000_0123] Info  : task1: Time(0x0000_0100) // the scheduled time
[0x0000_0140] Trace : task1: PkgLen(32)
[0x0000_0160] Info  : task2: ... some logging from a higher prio task
// here we can think of the logger to split incoming data per task to different output streams (similar to the `itm` tool).
...
[-----------] Info  : task1: Header([1,2,3,4])
[0x0000_0223] Info  : task1: Time(0x0000_0210) // Instant.now()

If wall clock time translation of Instant is available, we could even provide time in seconds, etc. but this is another topic... (Notice, we do not want the host to time stamp the logging data since that will be fairly inaccurate in comparison.)

Transport.

The generated host side logger will poll the queues by asynchronous read-memory operations. To this end probe.rs may be used as an outset. The actual locations of the generated SPCS queues is found in the elf (after linking). Tools that may come in handy are:

xmas-elf/gimili, for retrieving elf information.
probe.rs, for asynchronous memory operations.
Soundness. Since we use SPCS queues races on both sides should be possible to avoid (by careful implementation).

Considerations

This is just a sketch. Some initial experiments and work by @japaric (on deferred formatting and in-memory logging), and @korken89 (on elf parsing), have explored simar approaches. None however have taken the outset of RTFM and automatic code generation.

The hereby proposed approach brings some unique opportunities by exploiting type information present in the system under test. It also allows fine grained control over buffer sizes.

However, the approach comes with the complexity of code generation of the logger, and parsing of the elf to deduce the memory locations. To this end a wrapping build script could be a possible solution, generating a "runner" for the compilation instance.

The memory lane based approach offers logging/tracing without the need for additional communication channels for the transport, e.g. ITM/serial etc. Thus, it simplifies setup. Moreover, it would enable tracing on targets without ITM support, without allocating additional serial peripherals. Dependent on implementation it could complement traditional tracing (with target side formatting, such as ITM tracing, potentially using the log facade.

In theory it should play well with other debug features as breakpoints etc., over the same host interface (probe.rs already provides rudimentary gdb stubs, and initial vscode integration).

Zero Cost?

The question here to ask, "can we do better?".

Logging cost: The cost of logging boils down to copying the data. It would be very hard to prove soundness (race freeness) without copying. (Perhaps possible by using a guard on the data, but the copy approach seems the right way here.) Since we use an SPCS queue, we already have info on the availability of new data, we need no additional signalling. Structured data as in structs, enums, arrays etc. are copied verbatim to memory (and is efficiently implemented by the Rust compiler). As time stamping can be opted out, we only need to take the cost in case we want.

In the extreme, a typed log message, would require no copying, on the sender side, just an increment by 1 on the producer side of the SPCS (a plain enum variant like ThisHasNoData, would still be implemented as byte sized data item, as we are using the producer pointer as a means to indicate an "event"). This should boil down to a few CPU cycles!

Transport cost: The only foreseeable implication is bus contention (between the host and the target accessing the same memory bus on in the target's memory). This overhead might be mitigated for systems with several memory banks on different busses (and allocate the SPCS queues in a dedicated bank), an isosteric optimization not likely to be needed, unless under extreme stress (e.g., DMA activities).

Implementation complexity.

Well, there is no free lunch! This will take some effort to iron out. To mention a few remaining challenges. The code generation is likely quite tricky. The build system, to ensure consistency between source code and generated logger. The transport and queue soundness, needs careful implementation.

Limitations and correctness.

The approach builds on Copy types, and make the assumption that that we can re-construct the structural representation on the host side, directly by transmuting the data (retrieved from the target). This makes some assumptions on byte ordering, and layout. Since we do not depend on pointers, and other Rust types have explicit size, there is a good chance it holds, (under the assumption we use repr(C). Maybe there will be dragons? Let's find out!

perlindgren commented 4 years ago

Alternative approach and syntax

An alternative way to approach logging is to exploit the message passing already built in (soft task batteries included). We can add a feature to rtfm, --feature log, that generates the logger. The logger simply logs the message queues (or some subset of them, but the actual logger could be designed in various ways, with or without integration to IDE, and not really in scope here.

Syntax

A possible syntax could look like this.

#[task(binds = UART0, spawn = [foo, bar])] 
fn uart0(c: uart0::Context) {
    ...
    c.spawn.foo(0).unwrap();
    ...
    c.spawn.bar(1.0).unwrap();
    ...
    c.spawn.foo(1).unwrap();
}   

#[task(binds = UART1, spawn = [bar]] 
fn uart1(c: uart1::Context) {
    ...
    c.spawn.bar(1.0).unwrap();
    ...
}   

#[task(log = true, capacity = 4)] // this will be time stamped
fn foo(_: foo::Context, x: u32) {
    // a log task need no body, it will never be scheduled
}

#[task(capacity = 2, time_stamp = true)] // this will be time stamped
fn bar(_: bar::Context, x: f32) {
    // some ordinary task which we want to log parameters
}

Semantics

The log keyword implies that the target task is just intended for logging, thus the corresponding task will never be executed.

The time_stamp keyword indicates that an additional time stamp field is added to the payload (reflecting Instance::now() for the spawn).

For non logging tasks, time_stamp defaults to false (we don't want to bloat ordinary tasks), but can be opted in. For logging tasks time_stamp defaults to true, but can be opted out.

--features log Implies that the target side logger is built. Building without the log feature replaces the spawn to log task by an empty function, thus all log related functionality should be compiled out in an optimized build.

Considerations

In the case of ordinary message queues, the consumer is triggered when a message is available and executed based on priority. If the queue is full when spawning the result is an error. So the unwrapping would never be able to detect if the host side logger is keeping up (essentially we are eves dropping the message queue, and there is no way to tell if there is an external observer or not).

If case of log marked tasks. These are never actually executed. So for those one can let the host side progress the consumer side, and the result of spawn could be used to inform the target application if the host side logger keeps up or not. If there is no host attached, spawn will fail as soon as the queue is full, so if logging is left in production code, result should not be seen as an error.

Most likely a production build will be compiled without the logging feature, but there might be cases where you might want to attach a logger post deployment to instrument what is going on in the system.

Since messages queues are in general MPSC, there might be some locking involved if posting to a logger from different priorities (need to check again the actual implementation behind the message passing queue, maybe re-visiting the message passing implementation in RTFM could lead to further performance improvements....) In any case, if avoiding to use the same queue for logging tasks at different priorities this problem should be possible to mitigate. Moreover, since it builds on the RTFM task model, real-time analysis of the system could incorporate the OH of logging as well.

In general, tasks not bound to interrupts (soft tasks) require a dispatcher (interrupt). In the case of log tasks, as they are never dispatched, they should not require a dispatcher.

Implementation complexity

Similar to the initial syntax proposed.

nickray commented 4 years ago

An opinionated take (offered as food for thought): https://dave.cheney.net/2015/11/05/lets-talk-about-logging

On our side, I'm investigating (pragmatically/hacky) logging features per crate, using funnel. Even more fine-grained control filtering which debug logs are not removed at compile time would be very useful.

perlindgren commented 4 years ago

An opinionated take (offered as food for thought): https://dave.cheney.net/2015/11/05/lets-talk-about-logging

On our side, I'm investigating (pragmatically/hacky) logging features per crate, using funnel. Even more fine-grained control filtering which debug logs are not removed at compile time would be very useful.

Thanks for the pointer. I principle I agree to the opinionated take (all these logging levels seems confusing).

I think however it may be meaningful to be able to filter log messages (both at run-time and for post-mortem processing, pin-pointing patterns leading up to some condition). To that end, one need some sort of (automatic) tagging. For the Memory Lane, doing at message level would allow us to automatically deduce/tag sender and receiver. (So e.g. filtering on received or sent messages by a specific task should be possible.) This type of filtering should be able to reduce "noise" the printout during run-time, while still storing the full data log for post-processing. Whether there is a need or even possible for the designer/programmer to classify the severity/importance of messages at design time is another topic for discussion. Under some circumstances what appears to be of less importance at design time, may prove crucial later to detect a certain pattern, etc. On the other hand it may be useful to filter on the amount (or verbosity) of data presented at run-time, to avoid drenching. Moreover, the need for, and use of logging may look quite different from application to application.

He argues that "info" should be presented to the end user (e.g. informing that a file could not be opened), and sort of unconditionally presented. Personally I don't think that has anything to do with logging, that should be part of the "logic" of the program (not a logging feature), I think logging is a tool to the designer/maintainer (e.g., bringing your car to the workshop, the mechanic/engineer should be able to read out logs (complementary to over the air logging), and log an ECU in response to a test.) So logging can be useful throughout the lifetime of a system (not only design/implementation).

Another interesting thing to look into is the relation between testing, logging, and verification. Testing frameworks primarily focus on functional properties of programs. While this is indeed important, the correctness of an embedded system typically involves timely properties and transactional/transitional behaviour (as in stateful behaviour and ordering of events). Timely properties may under RTFM be addressed at compile time by scheduling analysis (as based on a proper model). However, event ordering and stateful behaviour is quite tricky to address at compile time (model checking comes to mind). The ground work on RTFM was partially done in collaboration with David Pereira et al. at CISTER/Porto-Portugal. They are focusing run-time verification, and we have initiated discussions on the use of Memory Lane to perform deferred (host side) run-time verification. The system events could in this setting model the intended behaviour under some suitable logic (Timed Automata/LTS, etc.), and verified at run-time by an "observer" running on the host. The primary idea here is that the monitoring/verification can be arbitrarily complex, while not affecting the application at run-time.

While run-time verification (and logics) may sound complex/esoteric, this is actually what you all do when observe a log (although you are the "observer" and the "model" and "logic" is in your head). Run-time verification just automates the process, see it like a programmable filter. The "logic" is your programming language, and the "model" is your application, and once done, you can sit-back relax and let the "monitor" do the work reliably (and forever).

perlindgren commented 4 years ago

Regarding the alternative approach, there is a potential concern regarding the log keyword. It may be misused by the user intuitively adding a log to the task with the intention of logging data of a task that is actually being spawned/scheduled (with the proposal, log imples that the task will not be executed.

Alternatively, we could think of a syntax where log is replaced by execute = false to more clearly indicate that the task is not going to be executed. (With the proposal all tasks are available to logging by the host system in any case so there is no need to add log anywhere.)

Another perhaps less important consideration is the visibility of logging. In cases you may want to prevent the visibility of task to be logged by the host. So one may think of an extending the syntax to allow the user to explicitly disable logging of specific tasks to the host. In that case log = false could be thought of as a suitable syntax. In case the message payload is large there will be some strain (side channel) of the host polling the data (there is no OH on the target besides that. Question is whether there is such a use case for log = false not sure.

rtic-rs / rfcs