[RFC]Introduce Hardware-level Device Isolation Subsystem

povergoing commented 1 year ago

Introduction

As presented at ZDS, https://events.linuxfoundation.org/embedded-open-source-summit/program/schedule/, we propose to introduce Hardware-level Device Isolation Subsystem to Zephyr:

Slides from the ZDS 2023 talk: https://static.sched.com/hosted_files/eoss2023/b0/Introduce%20Hardware-Level%20Device%20Isolation%20to%20Zephyr.pdf
The video is visible here: https://www.youtube.com/watch?v=PnyZ77B8pgA

EDIT: 08 Sept 2023

The draft https://github.com/zephyrproject-rtos/zephyr/pull/62374.
The readme is here: https://github.com/zephyrproject-rtos/zephyr/blob/9c673813fcee91a794fcc966464f5ca9fe2d8c8c/samples/hello_world/smmu_readme.md

Most architectures in Zephyr use MMU/MPU to isolate the thread memory regions to protect the system from buggy or malicious code.

However, MMU/MPU can only limit memory accesses from CPUs. Memory accesses such as those from DMA are not protected by MMU/MPU, which may cause critical security issues.

Problem description

Zephyr has been adding more DMA devices to the code, while many DMA devices might be buggy or even malicious. The Zephyr system access control provided by MMU/MPU does not apply to DMA devices so buggy or malicious DMA devices might bypass the Zephyr access control. Without taking action, Zephyr would be under increasing security risk.

Problem description

Zephyr has been adding more DMA devices to the code, while many DMA devices might be buggy or even malicious [1][2][3][4][5][6]. The Zephyr system access control provided by MMU/MPU does not apply to DMA devices so buggy or malicious DMA devices might bypass the Zephyr access control. Without taking action, Zephyr would be under increasing security risk.

DMA_attack

[1] https://googleprojectzero.blogspot.com/2017/04/over-air-exploiting-broadcoms-wi-fi_4.html [2] https://googleprojectzero.blogspot.com/2017/04/over-air-exploiting-broadcoms-wi-fi_11.html [3] https://www.bleepingcomputer.com/news/security/vulnerabilities-found-in-highly-popular-firmware-for-wifi-chips/ [4] https://web.archive.org/web/20160304055745/http://www.hermann-uwe.de/blog/physical-memory-attacks-via-firewire-dma-part-1-overview-and-mitigation [5] https://www.manageengine.com/device-control/prevent-dma-attacks.html [6] https://en.wikipedia.org/wiki/DMA_attack

Proposed change

Add a new SMMU/IOMMU Subsys framework for Zephyr so it can be easily extended in the future due to the variety of hardware-level solutions provided by different architectures
To mitigate the issue, we introduce SMMU technology for the Cortex-A platform as an implementation example.

Detailed RFC

Introduce a subsystem for Zephyr to isolate the DMA devices by leveraging the HW-lvl isolation technologies. The subsystem will design generic APIs for the DMA device drivers to restrict DMA access within the expected memory boundaries. The APIs are independent of the specific isolation technologies and can be easily extended to support multiple technologies such as SMMU, IOMMU .etc.

Proposed change (Detailed)

Overview

The following diagram illustrates the overview design. Overview_design The Zephyr Dev Isolation subsystem consists of 2 parts:

allow the DMA driver to register the devices into the system
allow the DMA driver to map/remap/unmap the DMA buffer for the devices

Dev Isolation Domains: Defined an address space. One Domain has one linear space with multi regions. This is a reserved concept for the extension of virtualization.

DTS

Basic requirements to make them work:

Essential info to describe

HW device isolation info (e.g. SMMUv3)
DMA devices
The relationship between Dev Isolation and DMA devices

Specifically, this DTS interface should follow the standard of the DT bindings:

NOTE: The standard is designed for the MMU system. Zephyr uses linear address space, might need a more generic design?

DMA_dev_node provides necessary information on the DMA devices
- compatible
- reg/size
- dev type
- ...
- iommus or iommu-map to identify what device isolation technology to use.
The iommus or iommu-map attribute is the most important item under the DMA devices node to be restricted.
- The iommus or iommu-map contains A list of phandle and IOMMU specifier pairs that describe the IOMMU master interfaces of the device.
- StreamID information
Device isolation node (for example: smmu: smmu@xxxx) provides the Dev Isolation info servicing for the corresponding device isolation driver (e.g. SMMUv3 driver)
- compatible
- reg/size
- cells

DMA_dev_node:xxxxxx {
    compatible = "xxxx";
    reg = <xxxx xxxx>;
    ...

    iommu-map = <&smmu 0 0 0x10000>;
};

smmu: smmu@xxxx {
    compatible = "arm,smmu-v3";
    reg = <xxxx xxxx>;
    #iommu-cells = <3>;
};

Taking AHCI with PCI as an example:

pci: pci@40000000 {
    compatible = "pci-host-ecam-generic";
    reg = <0x40000000 0x10000000>;
    msi-parent = <&its>;

    #address-cells = <2>;
    #size-cells = <1>;
    ranges = <0x2000000 0x50000000 0x50000000 0x10000000>;

    iommu-map = <&smmu 0 0 0x10000>;

    ahci: ahci0 {
        compatible = "ata-ahci";

        vendor-id = <0x0abc>;
        device-id = <0xaced>;

        status = "okay";
    };
};

smmu: smmu@2b400000 {
    compatible = "arm,smmu-v3";
    reg = <0x2b400000 0x100000>;
    #iommu-cells = <3>;
};

PCI node provides the essential information to make the PCI and AHCI driver work. The smmu node provides the essential information for SMMUv3. The iommu-map item under the PCI node points to smmu node and the StreamID

For more examples see https://www.kernel.org/doc/Documentation/devicetree/bindings/iommu/iommu.txt for non-PCI devices and https://www.kernel.org/doc/Documentation/devicetree/bindings/pci/pci-iommu.txt for PCI devices.

APIs

APIs for DMA driver to use

int deviso_dom_switch( … )

This is for users to switch domains, but currently, it's reserved.

int deviso_ctx_alloc(const struct device *dev, uint32_t sid)

This is for the DMA driver to allocate a context which means letting the driver register the device into the system, correspondingly the underlying device isolation technology will allocate some resources for the DMA driver and devices.

int deviso_ctx_free(const struct device *dev, uint32_t sid)

This is a reverting call of the deviso_ctx_alloc, to free the resources.

int deviso_map(const struct device *dev, uint32_t sid, uintptr_t base, size_t size)

This API allows the DMA driver to restrict the DMA buffer allocated for DMA devices. Upon restriction, the DMA devices are only allowed to access the restricted memory regions. _NOTE: xxxmap may not be a proper name

int deviso_unmap(const struct device *dev, uint32_t sid, uintptr_t base, size_t size)

This API is designed to eliminate the restriction done by deviso_ map _NOTE: xxxunmap may not be a proper name

APIs struct for implementations

Every device isolation implementation should implement and populate the following APIs.

__subsystem struct iommu_driver_api {
    int (*deviso_dom_switch)( … )

    int (*ctx_alloc)(const struct device *dev, uint32_t sid);
    int (*ctx_free)(const struct device *dev, uint32_t sid);

    int (*deviso_map)(const struct device *dev, uint32_t sid, uintptr_t base, size_t size)
    int (*deviso_unmap)(const struct device *dev, uint32_t sid, uintptr_t base, size_t size)
};

Integration to Memory Blocks Allocator

The Zephyr DMA device drivers use Memory Blocks Allocator (https://docs.zephyrproject.org/latest/kernel/memory_management/sys_mem_blocks.html) to allocate a DMA buffer for the DMA devices to transmit data. Therefore, the device isolation APIs can be integrated into the Memory Blocks Allocator, so that the DMA driver can leverage the SMMU without any changes.

Q: Why still keep the APIs there since they can be integrated into Memory Blocks Allocator? A: Because the use cases are complicated, leaving the subsystem independently there provides flexibility for different use cases from a design perspective.

The diagram shows how this subsystem works

subsystem_work

Use cases

There are several use cases among threads, DMA devices, and Device Isolation technologies. Taking SMMU as an implementation example.

On a single SMMU platform:

A single thread uses only one DMA device

This is the most basic use case.

The DMA driver calls deviso_ctx_alloc to register the device into the underlying SMMU driver during the driver init stage.
The thread opens a DMA device.
The SMMU driver alloc a stream table entry and other resources for the device.
The DMA driver allocates a buffer on memory as a DMA buffer for the DMA device to transmit.
The DMA driver calls deviso_map to map the region on SMMU and kick off the DMA device to transmit.
Thus the DMA device is restricted and is prevented from accessing other regions except for the DMA buffer.

A single thread uses multiple DMA devices

Multiple DMA devices use case is quite similar to the single one except for the SMMU usage. Every device has its own stream table entry points to its own translation table.

It's allowed for all DMA devices to share one translation table to lower memory usage. This should depend on the use case and it's a trade-off of security, memory space usage, and performance.

Q: What about the same class devices sharing one driver? A: There is no difference from an SMMU's perspective as long as the DMA device has its own unique StreamID.

Multiple threads share one DMA device

It's worth noting that DMA drivers should deal with the multi threads sharing of one device. Different drivers have their own solutions to share the device with multi-threads. From the devices' perspective, their simplified workflow is just waiting for the signal to start and then putting the data into the allocated memory region without caring about to whom the data is sent. Therefore, without changing the logic, the DMA driver is easy to "hook" to call _devisomap to restrict the memory regions (DMA buffers).

Also, there is a trade-off of whether all threads that are sharing one device share one restricted memory space. In another word,

The DMA device sees all regions for the threads. This strategy has a good performance, although it might break all regions it can see if the DMA device encounters a bug, it seems useless to isolate memory regions for all threads since the DMA device is already broken. The following diagram shows the principle.
The DMA device sees regions only for the current thread. This strategy is more secure, but it will increase the overhead since every thread needs an individual translation table. This strategy needs to leverage the substream ID and CD table capabilities of SMMU. See the following diagram.

Multiple threads share multiple DMA device This use case is basically a combination of the first two mentioned use cases.

On a multiple SMMUs platform:

A single thread uses multiple DMA devices over multiple SMMUs

There is no outstanding difference in terms of DMA driver compared with the 'A single thread using multiple DMA devices on a single SMMU platform'. The difference lies in the APIs of the device isolation subsystem. As designed, the APIs will eventually call the function pointers in struct iommu_driver_api implemented by the underlying device isolation technologies. To support multiple SMMUs, The APIs should detect which SMMU this device belongs to and then call the corresponding SMMU's implementation. This hypotaxis is defined from DTS and will be memorized during the ctx allocation stage.

Multiple threads share multiple DMA devices over multiple SMMUs

This use case is quite similar to the above ones.

Dependencies

This might slightly affect the corresponding DMA device drivers.

Concerns and Unresolved Questions

The standard is designed for the MMU system. Zephyr uses linear address space, might need a more generic design? At least the xxx_map/unmap need to be reconsidered.

Alternatives

povergoing commented 1 year ago

Tagging @carlocaione, as you might be interested. Tagging @SgrrZhf, as he is familiar with the implementation details.

carlocaione commented 1 year ago

@teburd can also be interested

teburd commented 1 year ago

This seems really neat, though I need to better think about how this fits into Zephyr.

I'd ask for some real world examples of how this would be used in the context of Zephyr device drivers or applications. I do think its a neat idea. I'm just not familiar enough with all the details to see where it might be useful off hand and would love to see that as part of the RFC.

E.g. how would this be used with a peripheral interface API, the DMA API today, in an application, etc.

dcpleung commented 1 year ago

If it is purely doing memory mapping, it would be a good idea to put this under sys_mm (maybe sys_mm_dev_*).

microbuilder commented 1 year ago

@ceolin @d3zd3z Perhaps something worth discussing in the security WG call, although with Jaxson and Huifeng in China the time zone is difficult for them to present the work.

microbuilder commented 1 year ago

For context:

Slides from the ZDS 2023 talk: https://static.sched.com/hosted_files/eoss2023/b0/Introduce%20Hardware-Level%20Device%20Isolation%20to%20Zephyr.pdf
The video is visible here: https://www.youtube.com/watch?v=PnyZ77B8pgA

povergoing commented 1 year ago

Sorry for the delay.

This seems really neat, though I need to better think about how this fits into Zephyr.

I'd ask for some real world examples of how this would be used in the context of Zephyr device drivers or applications. I do think its a neat idea. I'm just not familiar enough with all the details to see where it might be useful off hand and would love to see that as part of the RFC.

E.g. how would this be used with a peripheral interface API, the DMA API today, in an application, etc.

Good point, but we need some time to provide the details. Will add it into the RFC. Simply put, we build a PoC: AHCI controller (for SATA devices, driver is WIP) on PCI + SMMUv3 to restrict the AHCI controller.

Without SMMU restriction, the simplified process to enable the AHCI controller is 1) allocating a buffer (using Memory Blocks Allocator), 2) telling the address to the ACHI controller, and 3) signaling the controller to start transmitting data to that buffer. In theory, now the AHCI can access memory randomly instead of forcefully being restricted within that buffer range.
With SMMU, before kicking off the ACHI controller, the ACHI driver utilizes the SMMU to restrict the memory region. So, the AHCI controller can only access the buffer region allowed by SMMU and is prevented from accessing other regions.

If it is purely doing memory mapping, it would be a good idea to put this under sys_mm (maybe sys_mm_dev_*).

TBH, we didn't go through the sys_mm (you mean sys_mm_drv_*?) Can I ask it is designed to do mapping addresses for CPUs or for peripherals (DMA devices)? For what initiators is this API designed to do the mapping?

@ceolin @d3zd3z Perhaps something worth discussing in the security WG call, although with Jaxson and Huifeng in China the time zone is difficult for them to present the work. For context: Slides from the ZDS 2023 talk: https://static.sched.com/hosted_files/eoss2023/b0/Introduce%20Hardware-Level%20Device%20Isolation%20to%20Zephyr.pdf The video is visible here: https://www.youtube.com/watch?v=PnyZ77B8pgA

Thank you, Kevin. What should I do? Am I going to present it again? or? Not sure if the sys_mm that @dcpleung mentioned is actually the same thing we proposed. If yes, we probably need to rework it and see if it can be integrated into that API (or maybe need to extend the API too).

ceolin commented 1 year ago

Sorry for the delay.

This seems really neat, though I need to better think about how this fits into Zephyr. I'd ask for some real world examples of how this would be used in the context of Zephyr device drivers or applications. I do think its a neat idea. I'm just not familiar enough with all the details to see where it might be useful off hand and would love to see that as part of the RFC. E.g. how would this be used with a peripheral interface API, the DMA API today, in an application, etc.

Good point, but we need some time to provide the details. Will add it into the RFC. Simply put, we build a PoC: AHCI controller (for SATA devices, driver is WIP) on PCI + SMMUv3 to restrict the AHCI controller.

Without SMMU restriction, the simplified process to enable the AHCI controller is 1) allocating a buffer (using Memory Blocks Allocator), 2) telling the address to the ACHI controller, and 3) signaling the controller to start transmitting data to that buffer. In theory, now the AHCI can access memory randomly instead of forcefully being restricted within that buffer range.

With SMMU, before kicking off the ACHI controller, the ACHI driver utilizes the SMMU to restrict the memory region. So, the AHCI controller can only access the buffer region allowed by SMMU and is prevented from accessing other regions.

If it is purely doing memory mapping, it would be a good idea to put this under sys_mm (maybe sys_mm_dev_*).

TBH, we didn't go through the sys_mm (you mean sys_mm_drv_*?) Can I ask it is designed to do mapping addresses for CPUs or for peripherals (DMA devices)? For what initiators is this API designed to do the mapping?

@ceolin @d3zd3z Perhaps something worth discussing in the security WG call, although with Jaxson and Huifeng in China the time zone is difficult for them to present the work. For context: Slides from the ZDS 2023 talk: static.sched.com/hosted_files/eoss2023/b0/Introduce%20Hardware-Level%20Device%20Isolation%20to%20Zephyr.pdf The video is visible here: youtube.com/watch?v=PnyZ77B8pgA

Thank you, Kevin. What should I do? Am I going to present it again? or? Not sure if the sys_mm that @dcpleung mentioned is actually the same thing we proposed. If yes, we probably need to rework it and see if it can be integrated into that API (or maybe need to extend the API too).

Not need to present it again (maybe a highlevel overview) :) I think we could focus on how this fits in Zephyr userspace, which assets it is protecting, relate it with the usermode threat model (https://docs.zephyrproject.org/latest/kernel/usermode/overview.html#threat-model), ... We can talk more about these aspects of this RFC.

povergoing commented 1 year ago

Not need to present it again (maybe a highlevel overview) :) I think we could focus on how this fits in Zephyr userspace, which assets it is protecting, relate it with the usermode threat model (https://docs.zephyrproject.org/latest/kernel/usermode/overview.html#threat-model), ... We can talk more about these aspects of this RFC.

@ceolin No problem for me. IIRC, the meeting is https://github.com/zephyrproject-rtos/zephyr/wiki/Security-Working-Group? Shall we discuss it next Monday? Or any other suggested slot?

carlocaione commented 1 year ago

@ceolin No problem for me. IIRC, the meeting is https://github.com/zephyrproject-rtos/zephyr/wiki/Security-Working-Group?

Can you present to the Architecture meeting instead? Bigger audience ad this is not 100%-only security related.

povergoing commented 1 year ago

@ceolin No problem for me. IIRC, the meeting is https://github.com/zephyrproject-rtos/zephyr/wiki/Security-Working-Group?

Can you present to the Architecture meeting instead? Bigger audience ad this is not 100%-only security related.

Definitely okay for me. Is it settled? This one https://github.com/zephyrproject-rtos/zephyr/wiki/Architecture-Working-Group, right? Same question, next week or?

carlocaione commented 1 year ago

Definitely okay for me. Is it settled? This one https://github.com/zephyrproject-rtos/zephyr/wiki/Architecture-Working-Group, right?

yup.

Same question, next week or?

Yeah, not sure, people in Europe are gradually disappearing because of summer holidays. @carlescufi can we have this in the next arch meeting whenever that will be?

povergoing commented 1 year ago

Yeah, not sure, people in Europe are gradually disappearing because of summer holidays. @carlescufi can we have this in the next arch meeting whenever that will be?

Not a big deal about the slot, but just letting you know, the week of the next week( 24 July ~ 30 July) might be not available for me :)

microbuilder commented 1 year ago

Yeah, not sure, people in Europe are gradually disappearing because of summer holidays. @carlescufi can we have this in the next arch meeting whenever that will be?

Actually, I have a conflict for the security call this Monday (17 July), but would like to discuss this so I'd personally prefer a later security call, otherwise the arch call works for me ... although we'll have less time in that one since the agenda is usually fuller.

@povergoing I think anyone interested in the details can just watch the recording of your talk, and we can use an eventual security WG or arch call to ask questions, etc. It sounds like we need to work on the date, though, to make sure we can find something that works for you, and the hours are never very friendly for Asia.

EDIT: Sorry, missed some context on wanting to move this to Arch and not security to include more people. That call works fine for me any week.

microbuilder commented 1 year ago

For context:

* Slides from the ZDS 2023 talk: https://static.sched.com/hosted_files/eoss2023/b0/Introduce%20Hardware-Level%20Device%20Isolation%20to%20Zephyr.pdf

* The video is visible here: https://www.youtube.com/watch?v=PnyZ77B8pgA

@povergoing Perhaps you can edit your RFC to include these links near the top for people, since they'll likely miss them in the comments.

povergoing commented 1 year ago

@povergoing Perhaps you can edit your RFC to include these links near the top for people, since they'll likely miss them in the comments.

Thanks for the reminder! I have put the link at the top of the RFC.

ceolin commented 1 year ago

@ceolin No problem for me. IIRC, the meeting is https://github.com/zephyrproject-rtos/zephyr/wiki/Security-Working-Group?

Can you present to the Architecture meeting instead? Bigger audience ad this is not 100%-only security related.

@carlocaione security working group focus on the, well, security implications in this feature and it is usually done before other things. Nothing we discuss is 100% security related and as we could see by my comments I wanted to focus on the security implications of this proposal.

It is valid, and I would say good practice, in this type of proposal, discuss first the security implications (in the security wg) and after that architecture and implementation details in other forums like architecture meeting.

dcpleung commented 1 year ago

@povergoing The sys_mm is for mapping memory where how to map memory is dependent on the compiled driver. I think that eventually we will need to have a dispatcher to call different memory management drivers based on addresses and attributes. So it would be a good idea to group them under sys_mm from the beginning. App will only need to call sys_mm_* APIs for memory mapping instead of having to deal with multiple APIs.

Note that sys_mm_drv_* are for MM drivers. I am thinking sys_mm_dev_* for device related APIs if you need anything that is not covered by the current API that strictly applies to devices.

ceolin commented 1 year ago

Yeah, not sure, people in Europe are gradually disappearing because of summer holidays. @carlescufi can we have this in the next arch meeting whenever that will be?

Actually, I have a conflict for the security call this Monday (17 July), but would like to discuss this so I'd personally prefer a later security call, otherwise the arch call works for me ... although we'll have less time in that one since the agenda is usually fuller.

@povergoing I think anyone interested in the details can just watch the recording of your talk, and we can use an eventual security WG or arch call to ask questions, etc. It sounds like we need to work on the date, though, to make sure we can find something that works for you, and the hours are never very friendly for Asia.

EDIT: Sorry, missed some context on wanting to move this to Arch and not security to include more people. That call works fine for me any week.

@povergoing as Kevin won't be able to attend 07/17, I'll push it to the swg meeting after that, feel free to have it discussed in any other group in the meantime.

povergoing commented 1 year ago

@povergoing The sys_mm is for mapping memory where how to map memory is dependent on the compiled driver. I think that eventually we will need to have a dispatcher to call different memory management drivers based on addresses and attributes. So it would be a good idea to group them under sys_mm from the beginning. App will only need to call sys_mm_* APIs for memory mapping instead of having to deal with multiple APIs.

Note that sys_mm_drv_* are for MM drivers. I am thinking sys_mm_dev_* for device related APIs if you need anything that is not covered by the current API that strictly applies to devices.

@dcpleung Good to know, I will take a further look at this API.

@povergoing as Kevin won't be able to attend 07/17, I'll push it to the swg meeting after that, feel free to have it discussed in any other group in the meantime.

@ceolin Thank you. BTW, I will take leave next week (07/24 - 07/30), I think it might be pushed into next month.

ceolin commented 1 year ago

@povergoing are you available to discuss it in the next security working group ? (08/14)

povergoing commented 1 year ago

@povergoing are you available to discuss it in the next security working group ? (08/14)

Hi @ceolin, yes, I can join the meeting.

gregshue commented 1 year ago

Whatever we end up with needs to address the following use cases and questions:

The HW handoff from immutable boot code to a mutable application image. The boot code may use DMAs for HW assist in app image verification driving a graphics display. Some of these DMAs will need to be cleanly shut down. Others may need to continue running across the handoff (e.g., CGD).
The HW handoff from a running application image to catastrophic error handling logic. Catastrophic error handling may need to update a display and/or record failure information with a device currently being used by a DMA.
Support detecting when HW accesses a side of a double-buffer currently under FW control.
Support descriptor-driven DMAs.
Support controllers with multiple DMA channels, e.g., for a composite USB device (printer + scanner + USBMS + proprietary, where all interfaces are alive at once and each uses multiple endpoints).
Support multiple instances of a controller device.
Support recovery from detected failures without a device reset.

microbuilder commented 1 year ago

Whatever we end up with needs to address the following use cases and questions:

The HW handoff from immutable boot code to a mutable application image. The boot code may use DMAs for HW assist in app image verification driving a graphics display. Some of these DMAs will need to be cleanly shut down. Others may need to continue running across the handoff (e.g., CGD).

The HW handoff from a running application image to catastrophic error handling logic. Catastrophic error handling may need to update a display and/or record failure information with a device currently being used by a DMA.

Support detecting when HW accesses a side of a double-buffer currently under FW control.

Support descriptor-driven DMAs.

Support controllers with multiple DMA channels, e.g., for a composite USB device (printer + scanner + USBMS + proprietary, where all interfaces are alive at once and each uses multiple endpoints).

Support multiple instances of a controller device.

Support recovery from detected failures without a device reset.

There's room here for an iterative development process as well, where not every use case needs to be accounted for from day one, or progress never happens. We'd certainly never have had DT or KConfig or any major subsystem if the standard was global coverage of every use case from day one.

It's worth taking a broad view, but I think it's also reasonable here to start much smaller and deal with things like bootloader handoff in a second stage.

Placing every possible use case as a need or must have risks smothering any sort of progress from the get go IMHO.

gregshue commented 1 year ago

I definitely agree that design should be iterative. Getting consensus on user needs will also be a bit iterative. Once user needs are agreed to then we have a way to measure whether or not we are done. Releasing an API that doesn't meet all the agreed-upon needs just sets us up for API changes fairly soon down the road.

Placing every possible use case as a need or must have risks smothering any sort of progress from the get go IMHO.

I understand it can be daunting, though before long we'll have to concede that safety-critical and secure SDLC demand a requirements-driven process.

"Bootloader handoff in a second stage" is a great place to start.

carlocaione commented 1 year ago

Whatever we end up with needs to address the following use cases and questions:

Can we please start small?

gregshue commented 1 year ago

We can start as small as you like, but it won't be done until everything is addressed. Over the years I have found that error handling generally drives design. Perhaps we should look at handling logic needed within k_panic()?

povergoing commented 1 year ago

It's worth taking a broad view, but I think it's also reasonable here to start much smaller and deal with things like bootloader handoff in a second stage.

Placing every possible use case as a need or must have risks smothering any sort of progress from the get go IMHO.

Can we please start small?

I also prefer starting small.

Whatever we end up with needs to address the following use cases and questions:

I think these can be considered one by one in the future. As this RFC aims to add an SMMU driver and introduce a more "generic" API for SMMU-like technologies, thus the first priority thing we need to figure out is, if the subsystem API is "generic" enough to solve maybe 80% usecases? Since Zephyr uses linear mem map, IMHO, I think the current API can solve most common usecases(helping DMA drivers to restrict the devices)

gregshue commented 1 year ago

I think these can be considered one by one in the future.

Every time we postpone addressing a use case we set up the OOT users for another breaking change.

Which 80% of use cases are you proposing we address?

How about we start by answering these questions:

Is this proposal within the identified safety scope?
Is this proposal within the identified security scope?

ickochar commented 1 year ago

As this RFC aims to add an SMMU driver and introduce a more "generic" API for SMMU-like technologies, thus the first priority thing we need to figure out is, if the subsystem API is "generic" enough to solve maybe 80% usecases?

There are 2 main use cases on SMMU :

Access Control (covered as part of this RFC)
Address Translation

Since Zephyr uses linear mem map

Zephyr supports both linear map ( PA = VA ) and address translation for MMU ( PA = VA + offset) using following macros : DEVICE_MMIO_MAP(dev, K_MEM_CACHE_NONE) DEVICE_MMIO_GET(dev) DEVICE_MMIO_ROM_INIT(DT_DRV_INST(n)) ...

As part of SMMU design, can we have a design in place to accommodate following use cases at minimum ?

Device Isolation
Address translation ( PA = VA + offset)

And both these can be achieved using static table and stage 1 translation design itself.

P.S. We are working on ARM SMMU v3 currently and have driver to do stage 2 translation for our use case. If needed we can share more details.

povergoing commented 1 year ago

As part of SMMU design, can we have a design in place to accommodate following use cases at minimum ?

Device Isolation Address translation ( PA = VA + offset) And both these can be achieved using static table and stage 1 translation design itself.

Well spotted, yes you are right. Actually, we are going to borrow some API designs from Linux/BSD Unix, which is something like dev_map(domain, virt, phys, size) and I think the API can fulfill the use case you mentioned. What's more, I am thinking about whether this API will make sense for an MPU system (e.g. IOMPU which has only the isolation feature, no address translation)

P.S. We are working on ARM SMMU v3 currently and have driver to do stage 2 translation for our use case. If needed we can share more details.

Sure, definitely :)

ickochar commented 1 year ago

I am thinking about whether this API will make sense for an MPU system (e.g. IOMPU which has only the isolation feature, no address translation)

We can give an kconfig option to support address translation. Something on similar lines of option we have in MMU module - CONFIG_KERNEL_DIRECT_MAP

carlescufi commented 12 months ago

@povergoing please ping me in the Discord #architecture channel

ddavidebor commented 12 months ago

Would this require people writing DMA drivers to be aware and understand this subsystem? or would this be optional?

povergoing commented 11 months ago

Would this require people writing DMA drivers to be aware and understand this subsystem? or would this be optional?

I think it depends, 1) If your DMA driver already used the "standard" API to alloc DMA buffer (e.g. Memory Blocks Allocator, or something new: https://github.com/zephyrproject-rtos/zephyr/issues/57220), there is no need to be aware of this subsystem (we are going to enhance these "standard" APIs). 2) If your DMA driver is customized (for example, you statically alloc buffer) or you have very special requirements to map/re-map the buffer to the Device, you need to be aware of the subsystem. Additionally, this subsystem is an enhancement, and in most cases, the system can work without this subsystem, if you have the requirements of security, I suggest it would be better to have a basic understanding of this subsystem.

zephyrproject-rtos / zephyr