Closed andrewboie closed 1 year ago
@galak not sure why this is currently set up this way, probably out of expediency...the guy who wrote it has left the team so I'll never know for sure.
Wanted to know if you had any ideas on the best way to solve this.
cc @dcpleung @mbolivar @henrikbrixandersen
@andrewboie how much PCIe support do you see needing in the future. Trying to see if we need to finally deal with PCIe address in devicetree properly or come up with some zephyr specific solution.
@galak the specific use-case is setting up mappings in the MMU. For PCI-e devices, we need to probe bdf/id to get the base address and size of MMIO regions so we can install in the page tables. Currently this is done by passing the result of DT_REG_ADDR()/DT_REG_SIZE() to pcie_get_mbar() which is at minimum confusing.
We want to move driver MMIO mapping to the init functions. So in each driver we'd want something like the following pseudo-code, probably encapsulated into a helper functionb or macro:
if device_is_pci_device {
base = pcie_query_base_address(<bdf from devicetree>, <id from devicetree>)
size = pcie_query_size(<bdf from devicetree>, <id from devicetree>)
} else {
base = <base addr from devicetree>
size = <size from devicetree>
}
device_map_mmio(base, size)
I think this is all possible in the current scheme, it just looks weird. And there might be opportunities to do this probing behind the scenes, I'm not sure how much runtime logic we have for devicetree.
Later on when virtual memory enters the picture, device_map_mmio() will return a virtual mapping that the driver will need to store and use (see #26107)
Is there a reason we don't probe the PCI bus vs having this stuff in devicetree?
Is there a reason we don't probe the PCI bus vs having this stuff in devicetree?
I don't understand this question, can you elaborate?
Was just pointed to this.
Is there a reason we don't probe the PCI bus vs having this stuff in devicetree?
Largely because it's not the Zephyr way. You probe a PCI bus by enumerating every bus/device pair (there are 8k, actually more for modern devices with more than one us segment) for its VID/DID values and comparing it to every potential device/driver in your app. Doing that naively is a performance problem, and doing it Right involves precompiling some kind of sorted table of devices vs. their IDs, which is significant build complexity.
And that's never been Zephyr's model anyway. We aren't generating a binary image for every PC, we're generating one for the particular app on a particular SoC. So it makes more sense to refer to devices by their unique bus ID and not "probe" for them via a vendor/device pair.
Links to OpenFirmware specs that describe how to represent PCI/PCIe nodes in a devicetree:
https://www.devicetree.org/open-firmware/bindings/pci/pci2_1.pdf https://www.devicetree.org/open-firmware/bindings/pci/pci-express.txt
@galak could you address the specific details of Andy's last comment? The specs you referenced don't seem to offer an option where the BDF values would come from DTS. They also state that the reg
property is mandatory for PCIe child nodes, so if we encode the BDF into another property it leaves the question of what to do with reg
. Essentially, it seems to me that if we were to strictly follow these specs we would need to move to bus-wide probing. @andyross makes a pretty compelling case for why that might not be a good idea with Zephyr, and I've yet to see any counter arguments (besides "here are some specs", which isn't very helpful).
@jhedberg the spec conveys how to encode bus/dev/func in the reg property. Look at how phys.hi cell:
3. Numerical Representation
The numerical representation in PCI binding 2.1 section 2.2.1.1 has
been modified slightly to accommodate extended PCI express configura-
tion space of 4K bytes.
Bit# 33222222 22221111 11111100 00000000
10987654 32109876 54321098 76543210
phys.hi cell: nptx00ss bbbbbbbb dddddfff rrrrrrrr
phys.mid cell: hhhhhhhh hhhhhhhh hhhhhhhh hhhhhhhh
phys.lo cell: llllllll llllllll llllllll llllllll
where:
n is 0 if the address is relocatable, 1 otherwise
p is 1 if the addressable region is "prefetchable", 0 otherwise
t is 1 if the address is aliased (for non-relocatable I/O), below
1 MB (for Memory), or below 64 KB (for relocatable I/O).
x is there to take care of extended config space address bits for
PCI-E bus. x must be 0, for ss != 0
ss is the space code, denoting the address space
bbbbbbbb is the 8-bit Bus Number
ddddd is the 5-bit Device Number
fff is the 3-bit Function Number
rrrrrrrr is the 8-bit Register Number
hh...hh is a 32-bit unsigned number
ll...ll is a 32-bit unsigned number
But... why? Why can't we just make "bus", "device" and "function" integer properties and be done with it? I get that there's a specification, but... that's awful! Everyone can see that's awful, right? It's not even nibble-aligned, so you have to pull out a calculator just to match the packed numbers to ones you can read in documentation or lspci.
And to tie this back to my ancient comment above: Linux doesn't care that it's awful, because linux probes everything dynamically anyway. All they care about is uniqueness, not utility. We want these values so we can use them.
But... why? Why can't we just make "bus", "device" and "function" integer properties and be done with it? I get that there's a specification, but... that's awful! Everyone can see that's awful, right? It's not even nibble-aligned, so you have to pull out a calculator just to match the packed numbers to ones you can read in documentation or lspci.
It is what it is, I'm not going to argue about why we need to conform to an existing specification.
And to tie this back to my ancient comment above: Linux doesn't care that it's awful, because linux probes everything dynamically anyway. All they care about is uniqueness, not utility. We want these values so we can use them.
The spec predates linux. For the Zephyr case, how the bits are encoded shouldn't mater since its static data. So a simple PCI_BDF_TO_DT_REG()
macro can hide all the weirdness.
(Probably too much typing for this subject. I'll disappear after this bit, I promise)
To be a little glib: does linux actually use these DTS files? I'm 94% certain that it does not, and that on effectively all deployed PCI devices[1] nothing looks at that packed B/D/F data and does its probing via VID/DID. In fact, I'll put better than even money down that the data that does exist in files we can find is going to be reliably incorrect in these settings, since they're never tested nor exercised.
I'm not anti-spec, but if we're going to bother trying to follow something it should be for a useful purpose and not just hobgoblin-mode compliance with weird historical minutiae. Broadly: if there are specifications we want to throw in the trash, this is surely one of them.
[1] i.e. literally nothing on a PC cares, so if there's a user it's some oddball Android integration, or WiFi router or whatever somewhere, but even then I can't think of any with PCI buses. The file you link to is a mostly-failed 13 year old attempt at pushing Lincroft/Moorestown at the "embedded box" world, so I can understand why it might have felt it needed that DTS file. But has anything since tried to repeat the mistake?
There are a number of non-x86 embedded systems with PCI and PCIe on linux that utilize devicetree and use these mechanisms. I've developed embedded PowerPC support on linux and absolutely used the devicetree to describe PCI/PCIe busses. So this is used, while the majority of times things are probed as you state on linux, there are still a number of embedded type systems that don't do that for whatever reason and have the information described in the devicetree.
I pointed at the x86 dts as an example, more to show what the data should look like, I can spend a few hours and extract a dozen different cases on ARM, PowerPC, etc that also exist.
(I think the encoding is stupid as well, and thought that 10+ years ago when we used it linux as it inherited the OF standard specs that I referenced). It is what it is.
@galak thanks for pointing out where the BDF information is encoded. I initially skipped most of that part since it seemed irrelevant. The spec feels quite convoluted and would benefit of some examples of how devices with a given BDF should look like in DTS format.
Looking at the falcon falls DTS encoding, let's take this example:
multimedia@4,0 {
compatible = "pci8086,2e5d.2",
"pci8086,2e5d",
"pciclass048000",
"pciclass0480";
reg = <0x12000 0x0 0x0 0x0 0x0>;
interrupts = <4 1>;
};
Do I understand correctly that the 4,0
is what the Open Firmware refers to as the Text Representation DD,F
, i.e. device number 4, function 0? And the bus number should be taken from some property of the parent node (PCI bus node)?
What do the five elements of the reg
property refer to in that case? I'd have imagined the "numerical representation" to show up there, but I don't see any references to five elements (only three) in the Open Firmware specification.
Btw, I don't see how the VID/DID encoding into the compatible string would help us, since that would mean that drivers would need to have it as part of their code too, which is something I don't think we want. That said, the Open Firmware spec does seem to define standard device-id
and vendor-id
properties, so we should be able to use those.
Do I understand correctly that the
4,0
is what the Open Firmware refers to as the Text RepresentationDD,F
, i.e. device number 4, function 0? And the bus number should be taken from some property of the parent node (PCI bus node)?
That is correct. However, the DD = 4, and F = 0, is also encoded in the reg
property and that is the best place to extract that data from.
What do the five elements of the
reg
property refer to in that case? I'd have imagined the "numerical representation" to show up there, but I don't see any references to five elements (only three) in the Open Firmware specification.
So the reg
have 4 cells because, the bus has 3 #address-cells
and 2 #size-cells
:
pci@3fc {
#address-cells = <3>;
#size-cells = <2>;
The #address-cells
is 3, as that is how pci "addresses" are encoded as described in numerical representation (phys.hi, phys.mid, phys.low). PCI being a 64-bit bus, causes #size-cells
to be 2 to describe any case of having a >4G size associated with an address.
For what you are wanting to encode the size cells in the reg
will always be 0. Its the case when the devicetree represents the actual PCI bus addresses that size would come into play.
@galak I started experimenting with this "standard" based encoding, however it seems the Zephyr device tree processing doesn't like the multiple zero values in the reg
property:
devicetree error: zero-sized 'reg' in <Node /pcie0/uart@19,2 in '/Users/jhedberg/src/zephyr/zephyr/misc/empty_file.c'> seems meaningless (maybe you want a size of one or #size-cells = 0 instead)
The DTS (for ehl_crb
board) I'm experimenting looks something like this:
pcie0 {
#address-cells = <3>;
#size-cells = <2>;
compatible = "intel,pcie";
ranges;
uart2: uart@19,2 {
compatible = "ns16550";
reg = <PCIE_CFG_REG(0,0x19,2) 0x0 0x0 0x0 0x0>;
vendor-id = <0x8086>;
device-id = <0x4b4d>;
@galak one thing I don't quite get when looking at the falcon falls DTS in more detail, is that most PCI peripherals in it use "I/O space" (ss = 01
) encoding for their reg property, when I think "Config space" (ss = 00
) is the appropriate one. They also leave the phys.lo
value as zero in every case for the I/O address, which doesn't make much sense to me.
@galak I also get this error if I fix up the "zero address" stuff and let the processing continue, i.e. Zephyr device tree parsing doesn't like the way the PCI DTS spec places the Text Representation as the unit address:
devicetree error: <Node /pcie0/uart@19,2 in '/Users/jhedberg/src/zephyr/zephyr/build/zephyr/zephyr.dts.pre', binding /Volumes/case-sensitive/zephyr/zephyr/dts/bindings/serial/ns16550.yaml> has non-hex unit address
@galak @mbolivar-nordic any feedback on my last three comments? Are these incompatibilities with the OpenFirmware devicetree spec and the currently implemented Zephyr DTS validation/parsing something that can/should be fixed in the Zephyr devicetree implementation, or should we diverge from the OpenFirmware spec?
@galak one thing I don't quite get when looking at the falcon falls DTS in more detail, is that most PCI peripherals in it use "I/O space" (
ss = 01
) encoding for their reg property, when I think "Config space" (ss = 00
) is the appropriate one. They also leave thephys.lo
value as zero in every case for the I/O address, which doesn't make much sense to me.
Config space would be the PCI register config space vs general PCI MMIO access.
devicetree error: <Node /pcie0/uart@19,2 in '/Users/jhedberg/src/zephyr/zephyr/build/zephyr/zephyr.dts.pre', binding /Volumes/case-sensitive/zephyr/zephyr/dts/bindings/serial/ns16550.yaml> has non-hex unit address
This is an area that needs improvement in the edtlib parsing.
@galak @mbolivar-nordic any feedback on my last three comments? Are these incompatibilities with the OpenFirmware devicetree spec and the currently implemented Zephyr DTS validation/parsing something that can/should be fixed in the Zephyr devicetree implementation, or should we diverge from the OpenFirmware spec?
We need to enhance the tooling, NOT diverge from the OF spec.
reg = <PCIE_CFG_REG(0,0x19,2) 0x0 0x0 0x0 0x0>;
how are you defining PCIE_CFG_REG? I need to review the OF pci spec on the zero size.
reg = <PCIE_CFG_REG(0,0x19,2) 0x0 0x0 0x0 0x0>;
how are you defining PCIE_CFG_REG? I need to review the OF pci spec on the zero size.
@galak it expands to an integer value as per the OF spec:
#define PCIE_CFG_REG(bus, dev, func) ((bus) << 16 | (dev) << 11 | (func) << 8)
Btw, would the DT syntax allow having a preprocessor macro that expands to the entire list of values between <
and >
, i.e. including the 0x0
values? I wasn't sure, so for the purposes of testing I went with a macro that just expands to the first value (which is the only non-zero one as per the OF spec).
What's the best way to make progress with the Zephyr tooling? I suppose I should open two new GitHub issues for the identified shortcomings?
Btw, would the DT syntax allow having a preprocessor macro that expands to the entire list of values between
<
and>
, i.e. including the0x0
values? I wasn't sure, so for the purposes of testing I went with a macro that just expands to the first value (which is the only non-zero one as per the OF spec).
Sure, as long as the result of the preprocessor macro is valid DTS data.
What's the best way to make progress with the Zephyr tooling? I suppose I should open two new GitHub issues for the identified shortcomings?
As with all things, PRs are welcome to the tooling for the enhancements you need :).
( For the zero-size reg issue in edtlib):
device_type = "pci"
to dts/bindings/pcie/host/pcie-controller.yaml
(and update any dts files accordingly)devicetree error: zero-sized 'reg'
issue, we should update the check to ignore pci devices (do this by checking if the parent node has device_type = "pci"
.@galak the Devicetree spec (version 0.3) states:
The device_type property was used in IEEE 1275 to describe the device’s FCode programming model. Because DTSpec does not have FCode, new use of the property is deprecated, and it should be included only on cpu and memory nodes for compatibility with IEEE 1275–derived devicetrees.
So do we really want to extend our usage of it? I didn't (quickly looking) find any place in the specification that would forbid zero values for the register size, so maybe this check could be removed entirely from edtlib?
So do we really want to extend our usage of it? I didn't (quickly looking) find any place in the specification that would forbid zero values for the register size, so maybe this check could be removed entirely from edtlib?
I think its good to have, we can change it to a warning and look to add a check if its a pci device to ignore it.
@jhedberg can you file a bug or bugs with reproducers for your edtlib issues and assign to me?
We decided to go with with a centralised single scan of PCIe devices that builds a table of VID/DID and matching BDF values for all discovered devices. This means that the reg
property is no-longer used in PCIe DTS nodes, i.e. merging the PR #52107 should resolve this issue.
Is your enhancement proposal related to a problem? Please describe. For PCI devices, DT_REG_ADDR() and DT_REG_SIZE() do not contain the base address and size of MMIO regions. Instead, they respectively correspond to the PCI-e bus device function and ID values. See soc/x86/apollo_lake/soc.c to see what I mean, note how
device_mmu_region()
is used.This is not ideal as we are overloading information. One would expect these macros to always return a pointer and a size_t, and not a pcie_bdf_t/pcie_id_t pair in some cases but not others.
Describe the solution you'd like Some other dedicated DT macros for obtaining this information, perhaps. I don't have a good sense on the best way to approach this.