zephyrproject-rtos / zephyr

Primary Git Repository for the Zephyr Project. Zephyr is a new generation, scalable, optimized, secure RTOS for multiple hardware architectures.
https://docs.zephyrproject.org
Apache License 2.0
10.85k stars 6.61k forks source link

About MMU mapping on ARM64 #46477

Open carlocaione opened 2 years ago

carlocaione commented 2 years ago

Facts

On ARM64 we can MMU-map a memory region in two different ways:

Direct interface with MMU code for direct mapping

This is done by the ARM64 MMU code to setup the basic Zephyr regions (text, data, etc..) in: https://github.com/zephyrproject-rtos/zephyr/blob/bfec3b2ab4cc19a204a6f390c2c3b176cb32a695/arch/arm64/core/mmu.c#L649-L679

but it also used by the soc-specific code to map regions for peripherals that do not support the device MMIO APIs, for example in: https://github.com/zephyrproject-rtos/zephyr/blob/bfec3b2ab4cc19a204a6f390c2c3b176cb32a695/soc/arm64/qemu_cortex_a53/mmu_regions.c#L11-L22

This mapping is done directly in the MMU driver code and it is usually a direct (1:1) mapping.

Using the device MMIO (or MMU) APIs

There has been lately a certain effort to make the drivers using the device MMIO APIs. These API are leveraging the Zephyr MMU code to map the physical MMIO region of a peripheral to a virtual memory region automatically at init time (see include/zephyr/sys/device_mmio.h)

In general the mapping is not a direct mapping, but instead the virtual region is carved out from a memory pool of virtual addresses configured using CONFIG_KERNEL_VM_BASE and CONFIG_KERNEL_VM_SIZE.

Problems

There are several.

  1. The two methods are orthogonal, the only point of contact is the MMU driver that is actually doing the mapping.
  2. The Zephyr MMU code is using a simple mechanism to keep tracking of the allocated pages that is being bypassed by the direct interface with the MMU code, so in theory could be conflicts.
  3. Especially on ARM64 we have (theoretically) plenty of virtual memory so we really would like to do direct mapping for MMIO driver regions but this is not currently possible with the Zephyr MMU code.

Solution?

The easiest one is to give up the direct interface and instead relying exclusively on the Zephyr MMU code. This would force us to give up the 1:1 mapping or adding support for that.

Tagging the main actors involved @dcpleung @npitre @povergoing

povergoing commented 2 years ago

My concern would be, that the MPU should support the MMIO region too, but these MMIO APIs cannot be reused by MPU if it is not a 1:1 mapping design? You can simply consider the MPU as an MMU that only supports 1:1 mapping.

Does the 1:1 mapping or direct mapping mean virt_addr = phy_addr, IIUC? I didn't learn why the APIs in device_mmio.h needs non-direct mapping since the Zephyr is designed to be a single-memory-space OS.

I am not sure if it is suitable or how difficult it is. Is it possible that we re-use the kernel partitions? Add mmu_zephyr_ranges and all (or part of them marked by some label) peripherals regions defined in DTS into kernel partitions so that MMU or MPU could only consider how to fulfill the kernel partitions. Also, the APIs in device_mmio.h do nothing but add the region into kernel partitions.

carlocaione commented 2 years ago

My concern would be, that the MPU should support the MMIO region too, but these MMIO APIs cannot be reused by MPU if it is not a 1:1 mapping design? You can simply consider the MPU as an MMU that only supports 1:1 mapping.

Well, I was not aware of that and this is definitely concerning (:hankey:)

Does the 1:1 mapping or direct mapping mean virt_addr = phy_addr, IIUC?

Yes.

I didn't learn why the APIs in device_mmio.h needs non-direct mapping since the Zephyr is designed to be a single-memory-space OS.

I think @dcpleung could shed some light on this.

But the point is that when a physical address needs to be mapped using z_phys_map() the destination virtual address is obtained by a pool of virtual address and then mapped using arch_mem_map(). See: https://github.com/zephyrproject-rtos/zephyr/blob/d130160813fb52ee87a2d0c4a4fb8e57d466f181/kernel/mmu.c#L736-L751 so it definitely is not a 1:1 mapping (or AFAICT).

I am not sure if it is suitable or how difficult it is. Is it possible that we re-use the kernel partitions? Add mmu_zephyr_ranges and all (or part of them marked by some label) peripherals regions defined in DTS into kernel partitions so that MMU or MPU could only consider how to fulfill the kernel partitions. Also, the APIs in device_mmio.h do nothing but add the region into kernel partitions.

Uhm, this seems more complicated than adding support for 1:1 in the current API.

carlocaione commented 2 years ago

Well, I was not aware of that and this is definitely concerning (hankey)

Oh well, maybe not. I just checked and when MMU is not present (i.e. you have MPU), the device MMIO APIs are not mapping anything and you are basically back to accessing straight the phys.

dcpleung commented 2 years ago

The device MMIO was introduced before I took over userspace, so the design decision is a bit fuzzy. But IIRC, it is working similar to the Linux Kernel where the MMIO range is not 1:1 mapping in general (at least on x86).

Just wondering what would be the use case for having 1:1 mapping? I can see that it would make debugging easier, but in production, does it matter where the hardware registers are mapped?

carlocaione commented 2 years ago

Just wondering what would be the use case for having 1:1 mapping? I can see that it would make debugging easier, but in production, does it matter where the hardware registers are mapped?

Well, the big issue with Zephyr is that 95% of the drivers are not using the device MMIO API and that means that they are basically accessing the physical address all the times (usually the physical address is retrieved from the DT with the usual DT_REG_ADDR, saved into the config struct and used to access the various registers).

So either you fix the driver adding support for the MMIO API (so the driver uses the virt address instead of phys) or you add a 1:1 mapping leaving the driver unfixed. See for example what happened here https://github.com/zephyrproject-rtos/zephyr/pull/46443#discussion_r895421484.

This is a huge problem IMHO.

dcpleung commented 2 years ago

Well, the big issue with Zephyr is that 95% of the drivers are not using the device MMIO API and that means that they are basically accessing the physical address all the times (usually the physical address is retrieved from the DT with the usual DT_REG_ADDR, saved into the config struct and used to access the various registers).

So either you fix the driver adding support for the MMIO API (so the driver uses the virt address instead of phys) or you add a 1:1 mapping leaving the driver unfixed. See for example what happened here #46443 (comment).

This is a huge problem IMHO.

Driver not using MMIO API is indeed a huge issue when dealing with MMU as those addresses by default are not accessible. Though I was asking what were the use cases when using the MMIO API. I would assume a proper MMU implementation allows I/O addresses to be mapped into virtual space.

carlocaione commented 2 years ago

Though I was asking what were the use cases when using the MMIO API.

Oh right, I probably explained myself badly.

So, If you are using the MMIO API and the driver supports it there is indeed no problem, we are fine in that case even without a 1:1 mapping.

We still have to deal with the case where the driver is not using the MMIO API. In this case for ARM64 we are bypassing this problem by directly creating the 1:1 mapping using the MMU driver but entirely bypassing the Zephyr MMU code. So my suggestion was for this second case: removing the direct interface with the MMU driver and instead relying on the Zephyr MMU code to create the 1:1 mapping for all the driver still not supporting MMIO API.

dcpleung commented 2 years ago

Maybe we can convert those drivers to use the device MMIO API when they are being included? TBH, anything we do now to make those non-"device MMIO API" enabled devices work would be a stop-gap effort. So I think the proper way going forward is to convert them to use device MMIO API. Though... I don't know how many you will need to do at the moment. Could you hazard a guess on what you need for your development at the moment?

povergoing commented 2 years ago

removing the direct interface with the MMU driver and instead relying on the Zephyr MMU code to create the 1:1 mapping for all the driver still not supporting MMIO API.

Cool, that means, if we want MPU to support MMIO API instead of a big device region, we can extend the non-MMU case?

carlocaione commented 2 years ago

Maybe we can convert those drivers to use the device MMIO API when they are being included? TBH, anything we do now to make those non-"device MMIO API" enabled devices work would be a stop-gap effort. So I think the proper way going forward is to convert them to use device MMIO API.

Yes, this is indeed what I'm trying to do while reviewing new drivers submission: convince people to use MMIO API.

I don't know how many you will need to do at the moment. Could you hazard a guess on what you need for your development at the moment?

I don't need any for my development but: (1) this must be considered for new drivers submission and (2) this is part of a cleanup work to remove the mmu_regions for good.

About the point (2) in general having the two methods (the MMIO API and the direct mapping using mmu_regions) is confusing for developers and prone to errors in the long term (what if the MMIO API is mapping to a virt address that is already mapped by the MMU driver for example?).

carlocaione commented 2 years ago

Cool, that means, if we want MPU to support MMIO API instead of a big device region, we can extend the non-MMU case?

Possibly? But the MPU case is definitely easier (and more limited since you have a limited number of slots) and I'm not sure if going through the MMIO API is worth it

ibirnbaum commented 2 years ago

As far as I can tell from this discussion, the MMIO interface is intended for mapping devices' register spaces, but what about DMA areas?

Take the Xilinx Ethernet driver, for example: the DT of the two SoC families that support it define an OCM memory area to be used for the DMA. I can obtain that physical address via a 'chosen' entry which is configurable at the board level. At the SoC level, an identity mapping is set up via the mmu_regions table using just that information from the DT.

The driver declares the DMA area for each activated instance of the device (size may vary between instances, DMA parameters such as buffer count/size are configurable on a per-device basis) as a struct, of which one instance is placed in the OCM memory area using section and __aligned attributes:

#define ETH_XLNX_GEM_DMA_AREA_INST(port) \ static struct eth_xlnx_dma_area_gem##port eth_xlnx_gem##port##_dma_area\ __ocm_bss_section __aligned(4096);

Any access to those structs happen on the basis of the physical address, and the controller requires writing the physical addresses of certain members of that struct to its registers (namely TX queue base address, RX queue base address), which can just be obtained using &eth_xlnx_gem##port##_dma_area.some_member.

Will there be a way to map a DMA area aside from a device's register space, and will there be a way to resolve its physical address? What about situations like this one where the linker inserts references to the physical address based on section placement of data?

ibirnbaum commented 2 years ago

Also, if getting rid of the mmu_regions table entirely is the eventual goal, how will we handle required mappings that are not associated with any driver, but are required for the SoC code and maybe also some driver code to work properly? For example, the Zynq maps:

Will all that be moved to the device tree, including permissions?

carlocaione commented 2 years ago

As far as I can tell from this discussion, the MMIO interface is intended for mapping devices' register spaces, but what about DMA areas?

That's not part of the discussion really. The MMIO API is used only to map the MMIO registers space of the drivers, it's basically the Zephyr equivalente of the devm_ioremap_resource() linux call.

Take the Xilinx Ethernet driver, for example: the DT of the two SoC families that support it define an OCM memory area to be used for the DMA. I can obtain that physical address via a 'chosen' entry which is configurable at the board level. At the SoC level, an identity mapping is set up via the mmu_regions table using just that information from the DT.

You can keep doing that if you want.

Will there be a way to map a DMA area aside from a device's register space, and will there be a way to resolve its physical address?

You can create a 1:1 mapping using mmu_regions table and then using the physical address, or you can use something like z_phys_map() to create the mapping taking care of using the returned virtual address.

Also, if getting rid of the mmu_regions table entirely is the eventual goal, how will we handle required mappings that are not associated with any driver, but are required for the SoC code and maybe also some driver code to work properly?

I want to get rid of the mmu_regions table when this is used to map the MMIO region of drivers, because this is something that we should have done a long time ago already. All the other use cases are to evaluated on a case by case basis. If you need to map anything different from that you can keep using it or you can use something more fancy like z_phys_map() or k_mem_map().

Will all that be moved to the device tree, including permissions?

No.

ibirnbaum commented 2 years ago

@carlocaione Thanks for the info!

dcpleung commented 2 years ago

I am all for nudging everyone to use the device MMIO API. :)