zephyrproject-rtos / zephyr

Primary Git Repository for the Zephyr Project. Zephyr is a new generation, scalable, optimized, secure RTOS for multiple hardware architectures.
https://docs.zephyrproject.org
Apache License 2.0
10.79k stars 6.58k forks source link

devicetree: overloaded DT_REG_ADDR() and DT_REG_SIZE() for PCI devices #26109

Closed andrewboie closed 1 year ago

andrewboie commented 4 years ago

Is your enhancement proposal related to a problem? Please describe. For PCI devices, DT_REG_ADDR() and DT_REG_SIZE() do not contain the base address and size of MMIO regions. Instead, they respectively correspond to the PCI-e bus device function and ID values. See soc/x86/apollo_lake/soc.c to see what I mean, note how device_mmu_region() is used.

This is not ideal as we are overloading information. One would expect these macros to always return a pointer and a size_t, and not a pcie_bdf_t/pcie_id_t pair in some cases but not others.

Describe the solution you'd like Some other dedicated DT macros for obtaining this information, perhaps. I don't have a good sense on the best way to approach this.

andrewboie commented 4 years ago

@galak not sure why this is currently set up this way, probably out of expediency...the guy who wrote it has left the team so I'll never know for sure.

Wanted to know if you had any ideas on the best way to solve this.

cc @dcpleung @mbolivar @henrikbrixandersen

galak commented 4 years ago

@andrewboie how much PCIe support do you see needing in the future. Trying to see if we need to finally deal with PCIe address in devicetree properly or come up with some zephyr specific solution.

andrewboie commented 4 years ago

@galak the specific use-case is setting up mappings in the MMU. For PCI-e devices, we need to probe bdf/id to get the base address and size of MMIO regions so we can install in the page tables. Currently this is done by passing the result of DT_REG_ADDR()/DT_REG_SIZE() to pcie_get_mbar() which is at minimum confusing.

We want to move driver MMIO mapping to the init functions. So in each driver we'd want something like the following pseudo-code, probably encapsulated into a helper functionb or macro:

if device_is_pci_device {
    base = pcie_query_base_address(<bdf from devicetree>, <id from devicetree>)
    size = pcie_query_size(<bdf from devicetree>, <id from devicetree>)
} else {
    base = <base addr from devicetree>
    size = <size from devicetree>
}

device_map_mmio(base, size)

I think this is all possible in the current scheme, it just looks weird. And there might be opportunities to do this probing behind the scenes, I'm not sure how much runtime logic we have for devicetree.

Later on when virtual memory enters the picture, device_map_mmio() will return a virtual mapping that the driver will need to store and use (see #26107)

galak commented 4 years ago

Is there a reason we don't probe the PCI bus vs having this stuff in devicetree?

andrewboie commented 4 years ago

Is there a reason we don't probe the PCI bus vs having this stuff in devicetree?

I don't understand this question, can you elaborate?

andyross commented 4 years ago

Was just pointed to this.

Is there a reason we don't probe the PCI bus vs having this stuff in devicetree?

Largely because it's not the Zephyr way. You probe a PCI bus by enumerating every bus/device pair (there are 8k, actually more for modern devices with more than one us segment) for its VID/DID values and comparing it to every potential device/driver in your app. Doing that naively is a performance problem, and doing it Right involves precompiling some kind of sorted table of devices vs. their IDs, which is significant build complexity.

And that's never been Zephyr's model anyway. We aren't generating a binary image for every PC, we're generating one for the particular app on a particular SoC. So it makes more sense to refer to devices by their unique bus ID and not "probe" for them via a vendor/device pair.

galak commented 2 years ago

Links to OpenFirmware specs that describe how to represent PCI/PCIe nodes in a devicetree:

https://www.devicetree.org/open-firmware/bindings/pci/pci2_1.pdf https://www.devicetree.org/open-firmware/bindings/pci/pci-express.txt

jhedberg commented 2 years ago

@galak could you address the specific details of Andy's last comment? The specs you referenced don't seem to offer an option where the BDF values would come from DTS. They also state that the reg property is mandatory for PCIe child nodes, so if we encode the BDF into another property it leaves the question of what to do with reg. Essentially, it seems to me that if we were to strictly follow these specs we would need to move to bus-wide probing. @andyross makes a pretty compelling case for why that might not be a good idea with Zephyr, and I've yet to see any counter arguments (besides "here are some specs", which isn't very helpful).

galak commented 2 years ago

@jhedberg the spec conveys how to encode bus/dev/func in the reg property. Look at how phys.hi cell:

3. Numerical Representation

    The numerical representation in PCI binding 2.1  section 2.2.1.1  has
    been modified slightly to accommodate extended PCI express configura-
    tion space of 4K bytes.

    Bit# 33222222 22221111 11111100 00000000
         10987654 32109876 54321098 76543210

    phys.hi cell:  nptx00ss bbbbbbbb dddddfff rrrrrrrr
    phys.mid cell: hhhhhhhh hhhhhhhh hhhhhhhh hhhhhhhh
    phys.lo cell:  llllllll llllllll llllllll llllllll

    where:

    n   is 0 if the address is relocatable, 1 otherwise
    p   is 1 if the addressable region is "prefetchable", 0 otherwise
    t   is 1 if the address is aliased (for non-relocatable I/O),  below
    1 MB (for Memory), or below 64 KB (for relocatable I/O).
    x    is there to take care of extended config space address bits  for
    PCI-E bus.  x must be 0, for ss != 0

    ss      is the space code, denoting the address space
    bbbbbbbb    is the 8-bit Bus Number
    ddddd   is the 5-bit Device Number
    fff         is the 3-bit Function Number
    rrrrrrrr    is the 8-bit Register Number
    hh...hh     is a 32-bit unsigned number
    ll...ll     is a 32-bit unsigned number
galak commented 2 years ago

Here's an example:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/platform/ce4100/falconfalls.dts#n79

andyross commented 2 years ago

But... why? Why can't we just make "bus", "device" and "function" integer properties and be done with it? I get that there's a specification, but... that's awful! Everyone can see that's awful, right? It's not even nibble-aligned, so you have to pull out a calculator just to match the packed numbers to ones you can read in documentation or lspci.

andyross commented 2 years ago

And to tie this back to my ancient comment above: Linux doesn't care that it's awful, because linux probes everything dynamically anyway. All they care about is uniqueness, not utility. We want these values so we can use them.

galak commented 2 years ago

But... why? Why can't we just make "bus", "device" and "function" integer properties and be done with it? I get that there's a specification, but... that's awful! Everyone can see that's awful, right? It's not even nibble-aligned, so you have to pull out a calculator just to match the packed numbers to ones you can read in documentation or lspci.

It is what it is, I'm not going to argue about why we need to conform to an existing specification.

galak commented 2 years ago

And to tie this back to my ancient comment above: Linux doesn't care that it's awful, because linux probes everything dynamically anyway. All they care about is uniqueness, not utility. We want these values so we can use them.

The spec predates linux. For the Zephyr case, how the bits are encoded shouldn't mater since its static data. So a simple PCI_BDF_TO_DT_REG() macro can hide all the weirdness.

andyross commented 2 years ago

(Probably too much typing for this subject. I'll disappear after this bit, I promise)

To be a little glib: does linux actually use these DTS files? I'm 94% certain that it does not, and that on effectively all deployed PCI devices[1] nothing looks at that packed B/D/F data and does its probing via VID/DID. In fact, I'll put better than even money down that the data that does exist in files we can find is going to be reliably incorrect in these settings, since they're never tested nor exercised.

I'm not anti-spec, but if we're going to bother trying to follow something it should be for a useful purpose and not just hobgoblin-mode compliance with weird historical minutiae. Broadly: if there are specifications we want to throw in the trash, this is surely one of them.

[1] i.e. literally nothing on a PC cares, so if there's a user it's some oddball Android integration, or WiFi router or whatever somewhere, but even then I can't think of any with PCI buses. The file you link to is a mostly-failed 13 year old attempt at pushing Lincroft/Moorestown at the "embedded box" world, so I can understand why it might have felt it needed that DTS file. But has anything since tried to repeat the mistake?

galak commented 2 years ago

There are a number of non-x86 embedded systems with PCI and PCIe on linux that utilize devicetree and use these mechanisms. I've developed embedded PowerPC support on linux and absolutely used the devicetree to describe PCI/PCIe busses. So this is used, while the majority of times things are probed as you state on linux, there are still a number of embedded type systems that don't do that for whatever reason and have the information described in the devicetree.

I pointed at the x86 dts as an example, more to show what the data should look like, I can spend a few hours and extract a dozen different cases on ARM, PowerPC, etc that also exist.

(I think the encoding is stupid as well, and thought that 10+ years ago when we used it linux as it inherited the OF standard specs that I referenced). It is what it is.

jhedberg commented 2 years ago

@galak thanks for pointing out where the BDF information is encoded. I initially skipped most of that part since it seemed irrelevant. The spec feels quite convoluted and would benefit of some examples of how devices with a given BDF should look like in DTS format.

Looking at the falcon falls DTS encoding, let's take this example:

multimedia@4,0 {
    compatible = "pci8086,2e5d.2",
            "pci8086,2e5d",
            "pciclass048000",
            "pciclass0480";

    reg = <0x12000 0x0 0x0 0x0 0x0>;
    interrupts = <4 1>;
};

Do I understand correctly that the 4,0 is what the Open Firmware refers to as the Text Representation DD,F, i.e. device number 4, function 0? And the bus number should be taken from some property of the parent node (PCI bus node)?

What do the five elements of the reg property refer to in that case? I'd have imagined the "numerical representation" to show up there, but I don't see any references to five elements (only three) in the Open Firmware specification.

jhedberg commented 2 years ago

Btw, I don't see how the VID/DID encoding into the compatible string would help us, since that would mean that drivers would need to have it as part of their code too, which is something I don't think we want. That said, the Open Firmware spec does seem to define standard device-id and vendor-id properties, so we should be able to use those.

galak commented 2 years ago

Do I understand correctly that the 4,0 is what the Open Firmware refers to as the Text Representation DD,F, i.e. device number 4, function 0? And the bus number should be taken from some property of the parent node (PCI bus node)?

That is correct. However, the DD = 4, and F = 0, is also encoded in the reg property and that is the best place to extract that data from.

What do the five elements of the reg property refer to in that case? I'd have imagined the "numerical representation" to show up there, but I don't see any references to five elements (only three) in the Open Firmware specification.

So the reg have 4 cells because, the bus has 3 #address-cells and 2 #size-cells:

        pci@3fc {
            #address-cells = <3>;
            #size-cells = <2>;

The #address-cells is 3, as that is how pci "addresses" are encoded as described in numerical representation (phys.hi, phys.mid, phys.low). PCI being a 64-bit bus, causes #size-cells to be 2 to describe any case of having a >4G size associated with an address.

For what you are wanting to encode the size cells in the reg will always be 0. Its the case when the devicetree represents the actual PCI bus addresses that size would come into play.

jhedberg commented 2 years ago

@galak I started experimenting with this "standard" based encoding, however it seems the Zephyr device tree processing doesn't like the multiple zero values in the reg property:

devicetree error: zero-sized 'reg' in <Node /pcie0/uart@19,2 in '/Users/jhedberg/src/zephyr/zephyr/misc/empty_file.c'> seems meaningless (maybe you want a size of one or #size-cells = 0 instead)

The DTS (for ehl_crb board) I'm experimenting looks something like this:

    pcie0 {
        #address-cells = <3>;
        #size-cells = <2>;
        compatible = "intel,pcie";
        ranges;

        uart2: uart@19,2 {
            compatible = "ns16550";

            reg = <PCIE_CFG_REG(0,0x19,2) 0x0 0x0 0x0 0x0>;
            vendor-id = <0x8086>;
            device-id = <0x4b4d>;
jhedberg commented 2 years ago

@galak one thing I don't quite get when looking at the falcon falls DTS in more detail, is that most PCI peripherals in it use "I/O space" (ss = 01) encoding for their reg property, when I think "Config space" (ss = 00) is the appropriate one. They also leave the phys.lo value as zero in every case for the I/O address, which doesn't make much sense to me.

jhedberg commented 2 years ago

@galak I also get this error if I fix up the "zero address" stuff and let the processing continue, i.e. Zephyr device tree parsing doesn't like the way the PCI DTS spec places the Text Representation as the unit address:

devicetree error: <Node /pcie0/uart@19,2 in '/Users/jhedberg/src/zephyr/zephyr/build/zephyr/zephyr.dts.pre', binding /Volumes/case-sensitive/zephyr/zephyr/dts/bindings/serial/ns16550.yaml> has non-hex unit address
jhedberg commented 2 years ago

@galak @mbolivar-nordic any feedback on my last three comments? Are these incompatibilities with the OpenFirmware devicetree spec and the currently implemented Zephyr DTS validation/parsing something that can/should be fixed in the Zephyr devicetree implementation, or should we diverge from the OpenFirmware spec?

galak commented 2 years ago

@galak one thing I don't quite get when looking at the falcon falls DTS in more detail, is that most PCI peripherals in it use "I/O space" (ss = 01) encoding for their reg property, when I think "Config space" (ss = 00) is the appropriate one. They also leave the phys.lo value as zero in every case for the I/O address, which doesn't make much sense to me.

Config space would be the PCI register config space vs general PCI MMIO access.

galak commented 2 years ago

devicetree error: <Node /pcie0/uart@19,2 in '/Users/jhedberg/src/zephyr/zephyr/build/zephyr/zephyr.dts.pre', binding /Volumes/case-sensitive/zephyr/zephyr/dts/bindings/serial/ns16550.yaml> has non-hex unit address

This is an area that needs improvement in the edtlib parsing.

galak commented 2 years ago

@galak @mbolivar-nordic any feedback on my last three comments? Are these incompatibilities with the OpenFirmware devicetree spec and the currently implemented Zephyr DTS validation/parsing something that can/should be fixed in the Zephyr devicetree implementation, or should we diverge from the OpenFirmware spec?

We need to enhance the tooling, NOT diverge from the OF spec.

galak commented 2 years ago

reg = <PCIE_CFG_REG(0,0x19,2) 0x0 0x0 0x0 0x0>;

how are you defining PCIE_CFG_REG? I need to review the OF pci spec on the zero size.

jhedberg commented 2 years ago

reg = <PCIE_CFG_REG(0,0x19,2) 0x0 0x0 0x0 0x0>;

how are you defining PCIE_CFG_REG? I need to review the OF pci spec on the zero size.

@galak it expands to an integer value as per the OF spec:

#define PCIE_CFG_REG(bus, dev, func) ((bus) << 16 | (dev) << 11 | (func) << 8)

Btw, would the DT syntax allow having a preprocessor macro that expands to the entire list of values between < and >, i.e. including the 0x0 values? I wasn't sure, so for the purposes of testing I went with a macro that just expands to the first value (which is the only non-zero one as per the OF spec).

What's the best way to make progress with the Zephyr tooling? I suppose I should open two new GitHub issues for the identified shortcomings?

galak commented 2 years ago

Btw, would the DT syntax allow having a preprocessor macro that expands to the entire list of values between < and >, i.e. including the 0x0 values? I wasn't sure, so for the purposes of testing I went with a macro that just expands to the first value (which is the only non-zero one as per the OF spec).

Sure, as long as the result of the preprocessor macro is valid DTS data.

What's the best way to make progress with the Zephyr tooling? I suppose I should open two new GitHub issues for the identified shortcomings?

As with all things, PRs are welcome to the tooling for the enhancements you need :).

galak commented 2 years ago

( For the zero-size reg issue in edtlib):

  1. Add device_type = "pci" to dts/bindings/pcie/host/pcie-controller.yaml (and update any dts files accordingly)
  2. Update For the devicetree error: zero-sized 'reg' issue, we should update the check to ignore pci devices (do this by checking if the parent node has device_type = "pci".
jhedberg commented 2 years ago

@galak the Devicetree spec (version 0.3) states:

The device_type property was used in IEEE 1275 to describe the device’s FCode programming model. Because DTSpec does not have FCode, new use of the property is deprecated, and it should be included only on cpu and memory nodes for compatibility with IEEE 1275–derived devicetrees.

So do we really want to extend our usage of it? I didn't (quickly looking) find any place in the specification that would forbid zero values for the register size, so maybe this check could be removed entirely from edtlib?

galak commented 2 years ago

So do we really want to extend our usage of it? I didn't (quickly looking) find any place in the specification that would forbid zero values for the register size, so maybe this check could be removed entirely from edtlib?

I think its good to have, we can change it to a warning and look to add a check if its a pci device to ignore it.

mbolivar-nordic commented 2 years ago

@jhedberg can you file a bug or bugs with reproducers for your edtlib issues and assign to me?

jhedberg commented 1 year ago

We decided to go with with a centralised single scan of PCIe devices that builds a table of VID/DID and matching BDF values for all discovered devices. This means that the reg property is no-longer used in PCIe DTS nodes, i.e. merging the PR #52107 should resolve this issue.