RFC: Internal flash HIL

granaghan commented 3 years ago

Adding the doc from last week's conversation with a few updates.

Use Case

The intended use for this HIL is to focus on internal flash on SoCs. Prior discussions have sought solutions for any flash, both on and off chip. This is still a good goal for a separate HIL, but internal SoC flashes have unique enough limitations to merit a specific HIL. Notably, these include:

a register interface that programs a single word at a time
no DMA and a small or no write FIFO
no interrupt on operation complete-- polling is required
may have physical banks such that operating on a bank containing code being executed halts execution

Additionally, these flashes may have the unique feature of being memory mapped, allowing direct reads. A quick survey shows that these properties generally hold true for OpenTitan, NRF52, and STM32 chips.

Page Abstraction

Encodes a page based on a bank and index into that bank. Banks may be physical, but a logical concept of a bank may be useful for concepts such as splitting memory to accommodate A and B firmware images. This concept also allows for the HIL implementation for more easily track issues such as allow parallel operations on separate physical banks as well as provides an abstraction that when paired with an offset can resolve ambiguities by APIs that take addresses can create such as requesting to erase an address that is not on a page boundary.

pub struct Page {
    pub bank: usize,
    pub num: usize,
}

When implementing the this HIL, page_from _address and page_to_address functions will likely be helpful, but may not need to be exposed on the HIL.

Flash Poperties

The FlashProperties trait exposes constants that describe specific parameters to an implementation of the HIL.

pub trait FlashProperties {
    const WRITE_SIZE: usize,
    const ERASE_SIZE: usize,
    const PAGE_SIZE: usize,
}

Read

/// Read from a flash page into a buffer.
/// Synchronous operation.
fn read(&self, page: &Page, offset: usize, buf: &mut [u8]) -> Result<(), ErrorCode> {
    self.read_slice(page, offset, buf.into())
}

/// Read from a flash page into a process slice.
/// Synchronous operation.
fn read_slice(
    &self,
    page: &Page,
    offset: usize,
    buf: &WriteableProcessSlice,
) -> Result<(), ErrorCode>;

Takes a page, offset into the page, and a buffer. Synchronously reads up to one full page into the buffer. This assumes the reads are quick enough to read this amount of data without causing too much kernel latency, which holds true for the aforementioned chips.

Must return an error if the read would cross a page boundary.

Write

pub enum WriteSuccess {
    /// Write fully completed.
    /// Returns the number of bytes written.
    Complete(usize),
    /// Write partially succeeded, but was interrupted.
    ///Returns the number of bytes written.
    InProgress(usize),
}

/// Write data into a flash page from a process slice.
/// Synchronous operation.
fn write(
    &self,
    page: &Page,
    offset: usize,
    buf: &ReadableProcessSlice,
) -> Result<WriteSuccess, ErrorCode>;

A write that takes in a page, offset into the page, and a buffer. On success this should return whether the entire buffer was written as well as how many bytes were written so that the caller is able to resume interrupted writes. The buffer is not stored inside the HIL in order to prevent the need for complicated or static lifetimes. This also allows user space allow'd buffers to be written from without requiring a copy. When to interrupt a write is left to the specific HIL implementation as it may be influenced by factors such as write timings. In my current implementation, writes return after either 1 ms or an interrupt is pending.

Must return an error if the size if not an integer multiple of FlashProperties::WRITE_SIZE.
Must return an error if the write would cross a page boundary.
Number of bytes to write before returning InProgress is determined by the implementation. Factors to consider include the period of any watchdog timers, write speed, and allowable kernel latency.
This API does not protect against multiple writes. The behavior of writing a word multiple times is undefined.

Zeroize

fn zeroize(
    &self,
    page: &Page,
    offset: usize,
    length: usize,
    write_mode: WriteMode,
) -> Result<WriteSuccess, ErrorCode>;

Zeroize is a special case of write that eliminates the requirement to pass in a buffer. Security certifications often require that secrets can be ensured to be deleted. Zeroize provides the ability to clear just the secret without having to perform a full page erase. This relies on a consistent property of both NOR and NAND flashes that bits can always be written to 0. Zeroize is similarly interruptible to Write.

Must return an error if the operation would cross a page boundary.

Erase

/// Erase a page of flash.
/// Client::erase_complete() is called on completion.
fn erase_page(&self, page: &Page) -> Result<(), ErrorCode>;

Takes in a page and initiates an erase. An implementation may choose how to poll for completion and call the client. This could be though means such as issuing dynamic deferred callback to check every kernel scheduling cycle or by setting timers for when the operation is expected to complete.

Verify

/// Verifies that data in flash matches the slice.
/// Returns Ok if matches and FAIL otherwise
/// Synchronous operation.
fn verify_slice(
    &self,
    page: &Page,
    offset: usize,
    buf: &ReadableProcessSlice,
) -> Result<(), ErrorCode>;

Takes a page, offset into a page, and a buffer, verifying that the contents of flash match the buffer. Read timing assumptions from the Read operation apply here. This operation is an optimization to prevent having to keep two buffers in memory to verify the contents of flash. Current usage of this interface is verifying that a firmware update was properly written before rebooting into a new image.

Must return an error if the read would cross a page boundary.

Synchronous Write and Erase

/// Write data into a flash page from a process slice.
/// Synchronous operation.
fn write(
    &self,
    page: &Page,
    offset: usize,
    buf: &ReadableProcessSlice,
) -> Result<(), ErrorCode>;

/// Erase a page of flash synchronously.
fn erase_page_sync(&self, page: &Page) -> Result<(), ErrorCode>;

Special purpose write and erase function intended for early boot, before the kernel scheduler is running. These should not be used once the kernel scheduler is running as this will prevent interrupts bottom halves from running and may also result in a watchdog reset. These should likely be implemented as a separate trait so capsules that should not be able to issue synchronous operations cannot.

Access Control

pub enum Permission {
    ReadOnly,
    ReadWrite,
    ReadWriteErase
};
pub struct AccessControlConfig {
    start: Page,
    count: usize,
    perm: Permission
}

pub trait AccessControl {
    fn new(config: &[AccessControlConfig]) -> Self;
    fn read_allowed(
        &self,
        start: &Page,
        len: usize,
    ) -> Result<(), ErrorCode>;
    fn write_allowed(
        &self,
        start: &Page,
        len: usize,
    ) -> Result<(), ErrorCode>;
    fn erase_allowed(
        &self,
        start: &Page,
    ) -> Result<(), ErrorCode>;
}

AccessControlConfig describes a region of pages and the operations allowed on them. AccessControl is an object that returns an error if an operation is not allowed. An AccessControl is initialized with an array of AccessControlConfig.

Open questions:

Should the read, write, and erase trait require an AccessControl be set?
If not, is the default to allow all operations if one it not set?

hudson-ayers commented 3 years ago

Re: the Permission enum: When would you want to give a caller ReadWrite but not ReadWriteErase? The caller could just write all 0s, right?

vsukhoml commented 3 years ago

It may be more efficient to pass Page by value than by reference into read()/write() - this struct is lightweight and will be passed as 2 regular arguments.

AccessControlConfig can be based on standard range - https://doc.rust-lang.org/std/ops/struct.Range.html.

alistair23 commented 3 years ago

This looks similar to https://github.com/tock/tock/pull/2248 and https://github.com/tock/tock/pull/2249. The Page concept was suggested by Leon, but never gained any traction.

I feel that combining the async and sync operations in one HIL is prone to errors. I think it makes more sense to split it up into two HILs. Otherwise callers have to keep track on if a callback should occur or not.

I don't understand what AccessControl would be used for? Are you saying that the HIL would allow specifying access controls? That should be done in a capsule and not in the chip drivers though.

granaghan commented 3 years ago

Re: the Permission enum: When would you want to give a caller ReadWrite but not ReadWriteErase? The caller could just write all 0s, right?

That's a fair point. Unless access become more granular than page level, it makes sense to just have ReadOnly or ReadWrite permissions.

I feel that combining the async and sync operations in one HIL is prone to errors. I think it makes more sense to split it up into two HILs. Otherwise callers have to keep track on if a callback should occur or not.

Yes, they would be separate traits.

I don't understand what AccessControl would be used for? Are you saying that the HIL would allow specifying access controls? That should be done in a capsule and not in the chip drivers though.

This went through some revisions during the conversation. The outcome is to take an approach similar to network_capabilities. Code initializing capsules will create the proper capability and pass it to the capsule to use when making calls into the HIL.

tock / tock