zephyrproject-rtos / zephyr

Primary Git Repository for the Zephyr Project. Zephyr is a new generation, scalable, optimized, secure RTOS for multiple hardware architectures.
https://docs.zephyrproject.org
Apache License 2.0
10.8k stars 6.58k forks source link

Allow for selective deferment of device initialization (aka manual init) #39896

Closed palchak-google closed 1 month ago

palchak-google commented 3 years ago

Is your enhancement proposal related to a problem? Please describe. Zephyr currently assumes that all devices available in the system should be automatically initialized at boot. This assumption does not account for the the following common embedded scenarios:

Describe the solution you'd like Add an optional parameter to the Devicetree bindings for device nodes that allows application developers to specify whether initialization for that device should be deferred. This information should be stored in the init_entry struct for each device and checked in the z_sys_init_runlevel function before calling the init function for the device.

Devices with deferred initialization would then need to be manually initialized by the application code prior to use; Zephyr would provide a system call for manually invoking a device's initialization function (after checking that the device was not already initialized).

Describe alternatives you've considered An alternative solution would be to add a "deferred" status enumeration for devices, alongside "okay" and "disabled". This status indication would serve the same purpose as described above, namely inhibiting automatic boot-time execution of the device's initialization function. At compile time the build system could also check if any "okay" nodes appear to depend on other nodes marked as "deferred". This would help catch potential initialization defects.

tbursztyka commented 2 years ago

I don't think it would require an init_entry information here, another init level could just do: #define _SYS_INIT_LEVEL_MANUAL in init.h for instance. (thus a MANUAL to use in drivers). kernel/device.c would need relevant update of course. And a device_init() function exposed obviously (which would check that the device is being in the MANUAL level, else it would return -EALREADY or current init status). This solution would not make any structure to grow at least.

tokolist commented 2 years ago

Zephyr would provide a system call for manually invoking a device's initialization function (after checking that the device was not already initialized).

I believe there should not be a check if driver is already initialized. I faced issue when LCD should be reinitialized after cutting off LCD (and other devices) power to save battery life. And there is no way to reinitialize screen driver. Your idea is good, just if Zyphyr checks if driver was initialized it won't allow to reinitialize driver.

As an option there should be function to unload/destroy driver before doing something so in that case mentioned check makes sense.

palchak-google commented 2 years ago

The issue with no checking is that then it's possible for the system to get into an initialization loop where driver A depends on driver B, both require manual initialization, and each call the other's init method in their own init method. Admittedly this is a corner case, but so is the case of needing to re-init hardware that was powered off.

As an alternative, I'd propose that function that is called to manually invoke an init function take an argument to allow a "forced" initialization. A forced init would run the routine regardless of the current init state. This would be useful not just for re-initializing when the device is known to be uninitialized, but also for situations in which a device seems unresponsive and needs to be recovered.

I like the idea of a MANUAL init level. That's cleaner than a separate flag.

tbursztyka commented 2 years ago

I believe there should not be a check if driver is already initialized. I faced issue when LCD should be reinitialized after cutting off LCD (and other devices) power to save battery life. And there is no way to reinitialize screen driver. Your idea is good, just if Zyphyr checks if driver was initialized it won't allow to reinitialize driver.

As an option there should be function to unload/destroy driver before doing something so in that case mentioned check makes sense.

That's being discussed in this issue #20012

tbursztyka commented 2 years ago

As an alternative, I'd propose that function that is called to manually invoke an init function take an argument to allow a "forced" initialization. A forced init would run the routine regardless of the current init state. This would be useful not just for re-initializing when the device is known to be uninitialized, but also for situations in which a device seems unresponsive and needs to be recovered.

This would also fall within issue #20012

bvm84 commented 1 year ago

In my scenario LCD ST7785R needs first to be powered up by GPIO pins switching in main function. But ST7785R has POST_KERNEL initialization priority. So initialization fails because LCD do not respond on spi bus due to lack of power. Any suggestion how to fix this in Zephyr version: 3.2.99 before permanent solution will rise?

carlescufi commented 1 year ago

Architecture WG:

keith-zephyr commented 1 year ago

From the Arch WG, there were 3 possible approaches discussed. Summaries of three options are presented below.

All approaches will need to consider dependencies for any deferred nodes. For instance, if a node is deferred children/dependent nodes should also be deferred. We need to decide whether the application is responsible for initialization of all dependent nodes, or this will be handled automatically.

Option 1: new devictree property

Add a new property zephyr,deferred-init. This is described here.

Advantages

Disadvantages

Option 2: Set status property to "disabled"

Use status = "disabled"; to mark devicetree node instances that should be skipped by system initialization.

To exclude a devicetree node or driver instance, board overlay files need to use the /delete-node/.

Advantages

Indicates that the device is not presently operational, but it might become operational in the future (for example, something is not plugged in, or switched off). Refer to the device binding for details on what disabled means for a given device.

Disadvantages

Option 3: Create new value for status property for deferred nodes

Use the status property, but define a new value, "deferred", that is distinct from "disabled" or "reserved".

This is essentially a compromise between options 1 and 2.

Advantages

Disadvantages

nashif commented 1 year ago

@keith-zephyr thank you for the summary.

after grepping for status = "disabled" in the tree and seeing this go in the thousands, if I had to rank the options....

  1. options 3
  2. option 1
  3. options 2

Not sure how acceptable it would be introduce a new zephyr specific status in this case though.

gmarull commented 1 year ago

I doubt option 3 is a viable option unless we convince dt spec maintainers to add a new valid status to the standard. From the latest spec (0.4):

The status property indicates the operational status of a device. The lack of a status property should be treated as if the property existed with the value of "okay". Valid values are listed and defined in Table 2.4.

image

teburd commented 1 year ago

I doubt option 3 is a viable option unless we convince dt spec maintainers to add a new valid status to the standard. From the latest spec (0.4):

The status property indicates the operational status of a device. The lack of a status property should be treated as if the property existed with the value of "okay". Valid values are listed and defined in Table 2.4.

image

Reading this there's actually no status that matches what Zephyr uses the disabled status for today. Maybe whats needed is a new zephyr specific property that determines inclusion or exclusion of the build instead of relying on the status being disabled to do that.

nashif commented 1 year ago

Reading this there's actually no status that matches what Zephyr uses the disabled status for today.

exactly, we have been using "disabled" the wrong way, which IMO is worse than using our own status I guess.

teburd commented 1 year ago

Reading this there's actually no status that matches what Zephyr uses the disabled status for today.

exactly, we have been using "disabled" the wrong way, which IMO is worse than using our own status I guess.

It would be a very big ball of twine to untangle at this point to follow the spec the way say... Linux does.

"not presently operational, but might become operational in the future"

I mean this to me could be interpreted a bit. Maybe we just need our own attribute saying yes, disabled means not presently operational, but if this other property is set then in that case its possible to become operational later with some init/deinit type (module load/unload like) functionality.

I think option 1 makes the most sense, disabled still could be interpreted to be right then but Zephyr having its own needs to try and reduce image size takes the default to mean "never built in" unless the additional property is there.

~TL;DR - I vote Option 1~

gmarull commented 1 year ago

Well, our usage of "disabled" doesn't contradict the spec (device not presently operational). The only thing is that in Zephyr, the "might become operational" will never happen (not technically wrong). The thing is that disabled as is defined, seems to cover the deferred initialization case. For us, this is really a problem because disabled implies no device resource allocation at all. Also, "okay" + "zephyr,deferred-init" seems wrong, we'd be better off with "disabled" + "zephyr,deferred-init" (flag would be the mechanism to enable the "might become operational" part).

nashif commented 1 year ago

Also, "okay" + "zephyr,deferred-init" seems wrong, we'd be better off with "disabled" + "zephyr,deferred-init" (flag would be the mechanism to enable the "might become operational" part).

yes, that works better and there will not be any need for a Kconfig.... Let's call this "option 4".

Well, our usage of "disabled" doesn't contradict the spec (device not presently operational).

I am sure many of the thousands of disabled nodes we have are leftovers and things that will NEVER be operational, no way for us to verify because they are just being skipped and ignored....

henrikbrixandersen commented 1 year ago

I like option 4.

nashif commented 1 year ago

For context, some of the old/related issues:

nashif commented 1 year ago

Option 4: new devictree property with disabled status

Add a new property zephyr,deferred-init similar to option 1, but on disabled nodes. This is a combo of options 1 and 2

Advantages

Disadvantages

teburd commented 1 year ago

updating my vote. Option 4

nashif commented 1 year ago

updating my vote. Option 4

ditto

keith-zephyr commented 1 year ago

I'm good with option 4 as well. It's debatable as to whether option 1 or option 4 is "easiest to implement", but the other advantages make it the clear winner.

@galak - What are your thoughts on these options?

palchak-google commented 1 year ago

I like option 4, but with a slight rewording of the option name and a slight change to its meaning. Consider a scenario where we a DT file for a development board that lists all peripherals as "available", but some of these peripherals are mutually exclusive (because they use the same HW resources under-the-hood). E.g.

/{
    uart0: uart@1a00 {
        status = "disabled";
        zephyr,deferred-init;
    };

    spi0: spi@1a00 {
        status = "disabled";
        zephyr,deferred-init;
    };
};

Now, if my app-specific DT overlay file, I want to specify that I'm using UART0 for logging:

/ {
    chosen {
        zephyr,console = &uart0;
    };
}

&uart0 {
    status = "okay";
};

What should be the outcome in this situation? The UART0 node has both status="okay" and zephyr,deferred-init properties. So when is UART0 initialized, automatically at boot or later by the application? Or is this just an error detected at compile time?

I would suggest that the property we add be called zephyr,deferred_available (or zephyr,deferred_init_available). The semantics of the property should simply be: 1) If the property is present, all code should included in the build that is required to initialize (including de-init and re-init) and use the device 2) If the property is absent, the code required to init/use the device is optionally included based on the device status (this is what happens today).

With this change, the following combinations of DT properties are valid:

Status zephyr,deferred_available Code for device use Code for device de-init/re-init First init
okay X X X boot-time
okay X boot-time
disabled X X X app
disabled --

IMHO this makes it easier to adopt usage of the new property. DT overlay files that change only a node's status will continue to "do the right thing" rather than potentially putting the node into an invalid state. The property name then also makes it clear that the property is only affecting what code is included/excluded from the build, and that the init timing is still exclusively dictated by the status property.

palchak-google commented 1 year ago

Or, as a third option for the name: zephyr,init-deferrable. I like this the best. It's still short, it explicitly pertains to initialization, and it indicates that deferral is only possible but not guaranteed.

henrikbrixandersen commented 1 year ago

What should be the outcome in this situation? The UART0 node has both status="okay" and zephyr,deferred-init properties. So when is UART0 initialized, automatically at boot or later by the application? Or is this just an error detected at compile time?

I'd say that would either be a build-time error or a build-time warning (along with ignoring the zephyr,deferred-init property). Any node with status "okay" should be initialized at boot.

palchak-google commented 1 year ago

Any node with status "okay" should be initialized at boot.

I agree with this.

I'd say that would either be a build-time error or a build-time warning (along with ignoring the zephyr,deferred-init property).

I disagree with this. This makes overlay files more complicated, as changing a node from "available, but not automatically initialized" to "available, automatically initialized" requires both setting the status and deleting the zephyr,deferred-init. Requiring two modifications for what is arguably one of the more common operations of an overlay seems unnecessary.

I think the status and zephyr,deferrable-init properties should be independent and thus any combination of them is permissible and offer intuitive behavior.

gmarull commented 1 year ago

I think the status and zephyr,deferrable-init properties should be independent and thus any combination of them is permissible and offer intuitive behavior.

Note that zephyr,deferrable-init just says init is deferrable, but how do you turn it on/off? You need something else.

gmarull commented 1 year ago

Note: option 4 will force us to:

(1) change Kconfig DT_HAS_XXX_ENABLED to be y if we have the okay or disabled + zephyr,deferred-init (2) change in all drivers DT_INST_FOREACH_STATUS_OKAY(XXX_DEFINE) to something like DEVICE_DT_INST_FOREACH_ENABLED(XXX_DEFINE), so that it gets allocated with either okay or disabled + zephyr,deferred-init. Note that it would only be a requirement to use deferred init on a device, if not used, existing code would continue to work (ie semi-breaking change).

gmarull commented 1 year ago

Another note: I think zephyr,deferred-init should automatically propagate to all children nodes.

gmarull commented 1 year ago

I like option 4, but with a slight rewording of the option name and a slight change to its meaning. Consider a scenario where we a DT file for a development board that lists all peripherals as "available", but some of these peripherals are mutually exclusive (because they use the same HW resources under-the-hood). E.g.

/{
    uart0: uart@1a00 {
        status = "disabled";
        zephyr,deferred-init;
    };

    spi0: spi@1a00 {
        status = "disabled";
        zephyr,deferred-init;
    };
};

I think a better usage of devicetree here would be (assuming this is a kind of "multi-serial" IP):

serial0: serial@1a00 {
        /* set this compatible to use the IP for SPI */
        compatible = "vnd,serial-spi";
       /* set this compatible to use the IP for UART */
        compatible = "vnd,serial-uart";
};

But I guess that would complicate things a bit more. The reason is that you cannot instantiate > 1 device per DT node, so instantiating both (setting both compatibles?) + deferring init would not work.

keith-zephyr commented 1 year ago

Note: option 4 will force us to:

(1) change Kconfig DT_HAS_XXX_ENABLED to be y if we have the okay or disabled + zephyr,deferred-init (2) change in all drivers DT_INST_FOREACH_STATUS_OKAY(XXX_DEFINE) to something like DEVICE_DT_INST_FOREACH_ENABLED(XXX_DEFINE), so that it gets allocated with either okay or disabled + zephyr,deferred-init. Note that it would only be a requirement to use deferred init on a device, if not used, existing code would continue to work (ie semi-breaking change).

Regarding the FOREACH_STATUS_OKAY macros, I agree changing the name to FOREACH_ENABLED is clearer. This transition could be accomplished by having FOREACH_STATUS_OKAY call the equivalent FOREACH_ENABLED and then mark the FOREACH_STATUS_OKAY as deprecated.

palchak-google commented 1 year ago

Note that zephyr,deferrable-init just says init is deferrable, but how do you turn it on/off? You need something else.

That's what the status property does, same as it does today.

Another note: I think zephyr,deferred-init should automatically propagate to all children nodes.

This is probably a good idea. Though it should still be a compile time warning if a node marked with status = okay depends on a node marked status = disabled, regardless of deferrable-init.

The DT_HAS_XXX_ENABLED macros should probably be renamed DT_HAS_XXX_AVAILABLE, though as Keith suggested there will be a transition period where the two are largely synonymous.

palchak-google commented 1 year ago

I think a better usage of devicetree here would be (assuming this is a kind of "multi-serial" IP):

The example provided is actually derived from how the DT is written for Nordic's nRF9160. See here for an example.

carlescufi commented 7 months ago

Architecture WG:

Slide shown:

image
palchak-google commented 7 months ago

There are three major problems I see with the callback-based solution:

1) Disproportionate run-time overhead (in current RFC implementation)

The runtime overhead (in the current RFC implementation) is terrible. For each device in the entire system (not just the devices that are optionally present) the init code iterates through the entire list of "before init" callbacks. So for a system with ~30 devices total (remember, this includes all of the devices in the system, including all of the peripherals built-in to the SoC) and 5 optional devices, the init stage will have to make around 150 additional (indirect) function calls. And this is just to get the same information that in the best case could be obtained from five function calls.

2) The stated benefits are marginal or non-existent

The motivating benefit for the callback-based approach was that it potentially maintains the initialization sequence computed at compile time based on device-tree dependencies. The claim was that this would then let devices that are present (but might not have been) benefit from the "priority validation" performed during the compile-time devicetree processing. Except that it doesn't always.

To understand why, consider a scenario where the system has an optionally present UART bus and attached to that bus an optionally present GPS module. The presence of the UART bus depends on a board stuffing option (detectable via a GPIO), and the presence of the GPS module depends the board stuffing as well as whether the GPS data lines are connected (detected via a non-break condition on the UART). This system requires two "before init" callbacks, one for the UART and one for the GPS. The presence-detect callback for the UART seems pretty straightforward to implement. All it has to do is configure a GPIO as an input, read the GPIO, and then decide whether the UART is present or not. Except that the GPIO controller may or may not have been initialized yet, because there is no dependency in devicetree between the GPIO controller and the UART bus. Why not? Because the SoC vendor that wrote both drivers couldn't have predicted this situation. This dependency is unique to this particular product. So now the UART presence-detect callback has to check if the GPIO controller is already initialized, and if it isn't, then what?

Moving on, consider then the presence-detect callback for the GPS unit. This callback needs the UART to be present and initialized in order to detect the presence of the GPS unit. That's fine. Except that the callback doesn't automatically know which UART to check (for presence and init), so it has to pull that information from devicetree. So the GPS callback has to look at devicetree, determine what UART the GPS is on, and then determine if that UART is present. This means that callback reproduces the devicetree dependency information. Also, the GPS needs to check the board stuffing options. And, same as with the UART, checking the stuffing options depends on the GPIO controller. Furthermore, it's now possible for the stuffing options detection to be duplicated (and thus deviate) in two different callbacks. Or the the stuffing optoins detection is pulled out to a separate utility function, but one that has to be carefully maintained because it might be called before kernel init (as opposed to during normal "thread" context).

3) Devicetree no longer accurately models the hardware in the system

One of the key strenghts of devicetree is that it allows downstream consumers of a generic piece of hardware to customize the software to match a specific usage and configuration. Consider a organization where there is a central "platforms team" that produces a development board with the main SoC and memory and several connectors for plugging in interchangeable daughter boards for the various peripherals. This platforms team also produces a BSP for the main dev board, and that BSP registers all of the "before init" callbacks to detect the daughterboard configuration. Fast forward six months to the second rev of the development board, and by this time some of the peripherals have been finalized. These peripherals are moved off of the daughterboards onto the main board. Now the BSP for the rev 2 hardware needs a way to "unregister" some of the "before init" callbacks that are used for the rev 1 hardware. Or, the BSP needs to be changed to optionally register callbacks based on the board revision. This makes for two different places where different information is stored about which hardware is present on a given board. The devicetree for rev 2 largely mirrors that of rev 1, as its the same peripherals on the same buses (just some are now always present on rev 2). But elsewhere in a source file in the BSP is the preprocessor logic that spells out which peripherals are always present in rev 2 and which remain optionally present (aka auto-detected).

This problem becomes worse when the team that is consuming the development board (and BSP) are not in the same organization as the team building the dev board (and writing the BSP). In such a situation the downstream team will be burdened with maintaining a fork of the upstream BSP if the downstream team wants to take any device that the BSP treats as "optionally present" and make that device "always present". There is no way for the downstream team to "unregister" a "before init" callback that the BSP has registered, unless the BSP provides a Kconfig option to control registration of each callback. The BSP has resort to using Kconfig options to model parts of the hardware instead of devicetree, which is confusing. In devicetree, a downstream consumer can provide an overlay that completely overrides any property set in an upstream file. The callback approach does not provide such override flexibility, making it less general and more difficult to maintain over time and between teams.

Coda

The name "before init" callback is pretty terrible. "Pre-init" callback is at least a little better. The best, IMHO, would be to reframe the question from "should this device be initialized" to "is this device present", and thus have a "device present" callback. I also think this reframing applies to the naming for the devicetree property as proposed above. That is, rather than a zephyr,deferred-init property it should be zephyr,optionally-present. The concept of an "optionally present" device much more cleanly and clearly expresses the hardware model. Deferred initialization is then just one of the subordinate concerns that arises when supporting an "optionally present devices" feature.

fabiobaltieri commented 7 months ago

The specific use case can be covered by both approaches, have a generic devicetree node that just creates the callback for skipping and then init manually later, no difference.

The run-time overhead though I think it's way more interesting, let's dig on that bit more. I saw it brought up few times at this point with no data backing the argument, let's see: I'd expect callbacks to be structured something like "if this is not my device return; else do something". Say we have a 1000 devices and 3 callbacks:

static bool check_1(const struct device *dev)
{
        int ret;

        if (dev != (void *)100) {
                return false;
        }

        ret = gpio_pin_get_dt(&led);
        LOG_INF("pin is: %d", ret);

        return true;
}

static bool check_2(const struct device *dev)
{
        if (dev != (void *)110) {
                return false;
        }

        return true;
}

static bool check_3(const struct device *dev)
{
        if (dev != (void *)120) {
                return false;
        }

        return true;
}

static bool (*checks[])(const struct device *dev) = {
        check_1, check_2, check_3,
};

int main(void)
{
        while (1) {
                uint32_t ts = k_uptime_get_32();

                for (uint32_t i = 0; i < 1000; i++) {
                        for (uint8_t j = 0; j < ARRAY_SIZE(checks); j++) {
                                bool out;
                                bool (* fn)(const struct device *dev) = checks[j];

                                out = fn((const struct device *)i);
                                if (out) {
                                        LOG_INF("check %d", j);
                                }
                        }
                }

                LOG_INF("delta: %dms", k_uptime_get_32() - ts);

                k_msleep(SLEEP_TIME_MS);
        }

        return 0;
}
[00:00:03.667,968] <inf> main: pin is: 0                                                                    
[00:00:03.667,968] <inf> main: check 0                                                                      
[00:00:03.667,999] <inf> main: check 1                                                                      
[00:00:03.668,029] <inf> main: check 2                                                                      
[00:00:03.669,036] <inf> main: delta: 2ms                                                                   
[00:00:03.769,256] <inf> main: pin is: 0                                                                    
[00:00:03.769,287] <inf> main: check 0                                                                      
[00:00:03.769,317] <inf> main: check 1                                                                      
[00:00:03.769,317] <inf> main: check 2                                                                      
[00:00:03.770,324] <inf> main: delta: 1ms                                                                   

This is on an nRF52dk. Is this really a "disproportionate overhead"? Remember that we are talking about a one-off sequence at boot time, that is interleaved with driver init calls which not uncommonly include sleeps of tens if not hundred of millliseconds, not to mention this specific SoC in the default configuration does not even hit main in less than 250ms (I think it's waiting for some clock to stabilize).

Is my test flawed or unrepresentative?

palchak-google commented 7 months ago

The specific use case can be covered by both approaches, have a generic devicetree node that just creates the callback for skipping and then init manually later, no difference.

I'm assuming that you're referring to the BSP use case described in point 3? No, this cannot be covered by both approaches. The callback approach has no proposed or implemented explicit interaction with devicetree. There is no mechanism, again proposed or implemented, by which a callback can be added (or removed) by adding or removing a corresponding devicetree node. The only proposed feature of the callback approach is that any source file may add a callback. There is no mechanism for removing a callback in a downstream project except changing the upstream source.

... that is interleaved with driver init calls which not uncommonly include sleeps of tens if not hundred of millliseconds...

So the argument is that something else will surely be a longer pole in the tent, so it doesn't matter what the performance is of this as long as it's not the worst performing step? I can't agree with this type of thinking. Bad performance is almost always the accumulation of many small decisions that were made with the mentality of "this is just slightly wasteful, so it won't matter".

Is this really a "disproportionate overhead"?

You didn't measure a proportion, you measured the absolute overhead. What you then need to measure is the overhead of calling each of those three callbacks exactly once. This will give you the performance of the ideal solution. Then you compare those two numbers. This is where you see the disproportion in the overhead.

I'd expect callbacks to be structured something like "if this is not my device return; else do something"

That certainly is would be the most efficient structure for a callback, but there's nothing to guarantee this. And these callbacks will be written not by the folks who are often writing low level driver code, nor by folks who are likely even familiar with the init machinery and the significant penalty (1000x in the provided example) for inefficiency. Rather, these callbacks will be written by developers more focused on "gluing together" the right pieces of Zephyr to get something that boots on their custom boards. It's a big, well-hidden footgun.


And speaking of footguns, the provided example perfectly demonstrates another one: no features that depend on the kernel may be used in any of these callbacks. Thus, even a seemingly benign kprintf call or direct write to a memory mapped register to toggle a GPIO call for debugging could hose a system. The reason being that these callbacks have no knowledge of what init level they're running as. This means they could be running at EARLY or PRE_KERNEL_{1,2}. And there's not even a way for a callback to check what the current init level is.

There are multiple variants of this footgun; I described a different variant in my point 2. That is, within a callback the state of the system is relatively opaque. The callback doesn't know what init level(s) have been completed, whether the kernel is available, or whether any other hardware drivers that might be needed (such as GPIO controller, internal EEPROM controller, or even whether the clock has been configured for the peripheral bus hosting a particular memory-mapped register). Driver authors don't have to consider all these subtleties, because drivers benefit from the guarantee that their dependencies will be initialized before the driver is called to initialize.

nashif commented 7 months ago

And speaking of footguns, the provided example perfectly demonstrates another one: no features that depend on the kernel may be used in any of these callbacks. Thus, even a seemingly benign kprintf call or direct write to a memory mapped register to toggle a GPIO call for debugging could hose a system. The reason being that these callbacks have no knowledge of what init level they're running as. This means they could be running at EARLY or PRE_KERNEL_{1,2}. And there's not even a way for a callback to check what the current init level is. There are multiple variants of this footgun; I described a different variant in my point 2. That is, within a callback the state of the system is relatively opaque. The callback doesn't know what init level(s) have been completed, whether the kernel is available, or whether any other hardware drivers that might be needed (such as GPIO controller, internal EEPROM controller, or even whether the clock has been configured for the peripheral bus hosting a particular memory-mapped register). Driver authors don't have to consider all these subtleties, because drivers benefit from the guarantee that their dependencies will be initialized before the driver is called to initialize.

good point, I wonder how all of this would work with userspace enabled and the fact that callbacks in userspace are never trusted. @edersondisouza @dcpleung does this apply here?

dcpleung commented 7 months ago

IIRC, there was a PR introducing syscalls to set callbacks for a driver, so userspace can manipulate callbacks. However, it was agreed that it was a heavily frowned upon behavior, as the callback would be running in privileged mode (... which bypasses all userspace guards). So, in general, never let userspace do driver callbacks.

edersondisouza commented 7 months ago

And speaking of footguns, the provided example perfectly demonstrates another one: no features that depend on the kernel may be used in any of these callbacks. Thus, even a seemingly benign kprintf call or direct write to a memory mapped register to toggle a GPIO call for debugging could hose a system. The reason being that these callbacks have no knowledge of what init level they're running as. This means they could be running at EARLY or PRE_KERNEL_{1,2}. And there's not even a way for a callback to check what the current init level is. There are multiple variants of this footgun; I described a different variant in my point 2. That is, within a callback the state of the system is relatively opaque. The callback doesn't know what init level(s) have been completed, whether the kernel is available, or whether any other hardware drivers that might be needed (such as GPIO controller, internal EEPROM controller, or even whether the clock has been configured for the peripheral bus hosting a particular memory-mapped register). Driver authors don't have to consider all these subtleties, because drivers benefit from the guarantee that their dependencies will be initialized before the driver is called to initialize.

good point, I wonder how all of this would work with userspace enabled and the fact that callbacks in userspace are never trusted. @edersondisouza @dcpleung does this apply here?

At #67877 the callback is set up statically - there's a syscall to run the device initialisation, not add the callback, so I don't think this specific issue applies here.

edersondisouza commented 7 months ago

If we're fearing people abusing/misusing the callback, I guess the issue is insurmountable, unless we have some sort of restricted environment - eBPF? @palchak-google can I count you as favouring #67335 approach?

palchak-google commented 7 months ago

@edersondisouza Yes, you can count me as preferring the direction of #67335, though the implementation needs more iteration. As it stands it has its own footguns (silent failure of init if dependency was deferred) and sharp corners (no de-init).

edersondisouza commented 7 months ago

@edersondisouza Yes, you can count me as preferring the direction of #67335, though the implementation needs more iteration. As it stands it has its own footguns (silent failure of init if dependency was deferred) and sharp corners (no de-init).

Cool - and yes, it's not like the PR doesn't need more reviews (although de-init is maybe a next step) - it's more about the direction. Thanks for the feedback @palchak-google =D

edersondisouza commented 7 months ago

@fabiobaltieri any more inputs here?

fabiobaltieri commented 7 months ago

@edersondisouza I think I voiced all my arguments, happy to expand on anything specific if you feel like there's some open point but otherwise at this stage there's probably more value in hearing for other people opinions.

henrikbrixandersen commented 5 months ago

Can this be closed now that #67335 has been merged?