knurling-rs / defmt

Efficient, deferred formatting for logging on embedded systems
https://defmt.ferrous-systems.com/
Apache License 2.0
826 stars 76 forks source link

Hash .defmt section and store in firmware #328

Open apgoetz opened 3 years ago

apgoetz commented 3 years ago

Would it be possible to reserve a small section of defmt ids at the beginning of memory, in order to guarantee certain messages are located there? These reserved ids could be used by the defmt library, or in a potential future use case, they could be exposed to the end user.

Currently, as far as I know, there is no way for the defmt parser to "match" a given defmt symbol table to a defmt log message stream. By reserving a "well-known" defmt ID for a build ID, the parser can confirm the ELF matches embedded system without reloading the code.

Applications could also use these ids for fixed identifiers that don't vary between builds (e.g, crate name, product name, serial number, etc).

japaric commented 3 years ago

Would it be possible to reserve a small section of defmt ids at the beginning of memory

sounds possible but may need to be done in the cortex-m-rt linker script, somehow, for the data to appear at the beginning of Flash, but after the vector table. Trying to do this from the defmt.x may result in the data not appearing in the right place (i.e. at a known memory location) or it may create an invalid memory layout (e.g. the defmt data is placed before the vector table at address 0x0 -> device won't boot).

Propose reserving 'well-known' IDs in defmt section

in the previous quote you mention storing defmt ids at the beginning of the device memory and here you mention the .defmt section but to clarify the .defmt linker section (which contains the table that maps indices to complete strings) is not stored on the target's memory. This behavior is intended as it greatly reduces Flash usage.

Currently, as far as I know, there is no way for the defmt parser to "match" a given defmt symbol table to a defmt log message stream

Correct.

By reserving a "well-known" defmt ID for a build ID, the parser can confirm the ELF matches embedded system without reloading the code.

To be able to check if the firmware on the device matches the ELF one passed to e.g. probe-run the "well-known ID" (it could be a hash / digest of the entire defmt table) would have to either (a) be sent from the device first thing on boot (i.e. appear first in the defmt data stream) OR (b) be stored on the device Flash at some known location (*) (e.g. right after the vector table, or at the end of Flash).

(*) it may actually not need to be a hard-coded location to just be able to reject an ELF; its location could still be decided by the linker (e.g. be stored in a static variable), I think.

apgoetz commented 3 years ago

Sorry, I wasn't very precise above. I don't mean storing an ID in the device memory, I was referring to reserving defmt message IDs, so that there meaning was fixed. For example, the linker script would be modified to something like this:

$ cat defmt.x.in
/* exhaustively search for these symbols */
EXTERN(_defmt_acquire);
EXTERN(_defmt_release);
EXTERN(__defmt_default_timestamp);
PROVIDE(_defmt_timestamp = __defmt_default_timestamp);
PROVIDE(_defmt_panic = __defmt_default_panic);

SECTIONS
{

  /* `0` specifies the start address of this virtual (`(INFO)`) section */
  .defmt 0 (INFO) :
  {

    /* reserve well-known build-id at location 0*/
   "{\"tag\":\"defmt_well_known\",\"data\":\"Build-ID: {:str}\"}" = 1;

     /* reserve 3 message ids for future use by defmt */
    .  = ALIGN(4);

    /*Reserve  space for well-known messages, user can specify strings here that are output for the application*/
    .defmt.user-wellknown1;
    .defmt.user-wellknown2;
    .defmt.user-wellknown3;
    .defmt.user-wellknown4;

    /* Format implementations for primitives like u8 */
    *(.defmt.prim.*);

    /* Everything user-defined */
    *(.defmt.*);

    /* $DEFMT_VERSION may contain special chars, so we quote the symbol name */
    /* Note that the quotes actually become part of the symbol name though! */
    "_defmt_version_ = $DEFMT_VERSION" = 1;
  }
}

ASSERT(SIZEOF(.defmt.user-wellknown1) == 1, ".defmt.user-wellknown1 must be defined exactly once");
ASSERT(SIZEOF(.defmt.user-wellknown2) == 1, ".defmt.user-wellknown2 must be defined exactly once");
ASSERT(SIZEOF(.defmt.user-wellknown3) == 1, ".defmt.user-wellknown3 must be defined exactly once");
ASSERT(SIZEOF(.defmt.user-wellknown4) == 1, ".defmt.user-wellknown4 must be defined exactly once");

ASSERT(SIZEOF(.defmt) < 16384, ".defmt section cannot contain more than (1<<14) interned strings");

With this approach, the defmt ID 0 is reserved for build ID, 3 IDs are reserved for future use by defmt crate, well-known user messages start at 4, and dynamically assigned messages start at 8.

A separate macro could be defined for well known messages, that puts them in one of these specific sections.

During use, if there is a message with ID 0 in the log stream, whatever application is decoding the defmt messages would be confident that the selected ELF matches the message stream.

The application wouldn't be confident for messages before the build-id is sent, except for the well known messages, which would have a fixed format (assuming the user doesn't change them during development).

However, I don't know if it would be possible to guarantee the build ID message was always in the log stream. Even if you forced this message to be sent at reset, there is no guarantee that the messages will be logged. (for example, defmt messages sent over an intermittent connection, or a blackbox application that only records the last N messages).

Therefore I agree, for use cases that involve attaching a debugger (i.e., probe-run), it would be useful to have a symbol defined in the .rodata section with the fixed build ID. This symbol probably doesn't need to be at a fixed location in memory. If probe-run reads the data at that symbol location, and it doesn't match, that is sufficient to show the the build ID is incorrect.

jonas-schievink commented 3 years ago

If the goal is to ensure that the running firmware matches the ELF file opened on the host, we could just define an extra symbol for that, no need to reserve IDs anywhere.

However, I don't immediately see how the build ID would be computed and stored in the firmware. After all, procedural macros cannot keep state between invocations, so we can't compute a checksum of all the messages (and even if we could do this, it wouldn't work cross-crate).

Theoretically, one could reserve a u32 in .data that is filled with a checksum of the entire ELF file after building. A Cargo runner could then inject the computed checksum and make it available to both the app and the defmt host tooling. However, this requires quite an elaborate setup.

I'd also be wary about making the message stream stateful. The ability to decode arbitrary defmt frames without knowing prior ones can be quite useful. (I know you could still do that here, you'd just lose confidence that the builds match)

japaric commented 3 years ago

Theoretically, one could reserve a u32 in .data that is filled with a checksum of the entire ELF file after building. A Cargo runner could then inject the computed checksum

I would compute and insert the checksum into the ELF in a linker wrapper, like flip-link. IMO, that seems the most transparent way to do this because to the end user "it just happens" during the linking stage of a cargo build.

A cargo runner sounds like it would be too late because a cargo build would produce an invalid artifact (missing checksum). A user may use something like a cargo-xtask instead of a Cargo runner to flash / run / do-something with the ELF and the xtask would operate on an invalid ELF.

jonas-schievink commented 3 years ago

I would compute and insert the checksum into the ELF in a linker wrapper, like flip-link. IMO, that seems the most transparent way to do this because to the end user "it just happens" during the linking stage of a cargo build.

Ah, right, good idea!